4.6 KiB
This model was published in HF papers on 2024-07-24 and contributed to Hugging Face Transformers on 2026-01-12.
LW-DETR
LW-DETR proposes a light-weight Detection Transformer (DETR) architecture designed to compete with and surpass the dominant YOLO series for real-time object detection. It achieves a new state-of-the-art balance between speed (latency) and accuracy (mAP) by combining recent transformer advances with efficient design choices.
The LW-DETR architecture is characterized by its simple and efficient structure: a plain ViT Encoder, a Projector, and a shallow DETR Decoder. It enhances the DETR architecture for efficiency and speed using the following core modifications:
- Efficient ViT Encoder: Uses a plain ViT with interleaved window/global attention and a window-major organization to drastically reduce attention complexity and latency.
- Richer Input: Aggregates multi-level features from the encoder and uses a C2f Projector (YOLOv8) to pass two-scale features (
1/8and1/32). - Faster Decoder: Employs a shallow 3-layer DETR decoder with deformable cross-attention for lower latency and faster convergence.
- Optimized Queries: Uses a mixed-query scheme combining learnable content queries and generated spatial queries.
You can find all the available LW DETR checkpoints under the AnnaZhang organization. The original code can be found here.
Tip
This model was contributed by stevenbucaille.
Click on the LW-DETR models in the right sidebar for more examples of how to apply LW-DETR to different object detection tasks.
The example below demonstrates how to perform object detection with the [Pipeline] and the [AutoModel] class.
from transformers import pipeline
pipeline = pipeline(
"object-detection",
model="AnnaZhang/lwdetr_small_60e_coco",
device_map=0
)
pipeline("http://images.cocodataset.org/val2017/000000039769.jpg")
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForObjectDetection
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("AnnaZhang/lwdetr_small_60e_coco")
model = AutoModelForObjectDetection.from_pretrained("AnnaZhang/lwdetr_small_60e_coco", device_map="auto")
# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LwDetr.
- Scripts for finetuning [
LwDetrForObjectDetection] with [Trainer] or Accelerate can be found here. - See also: Object detection task guide.
LwDetrConfig
autodoc LwDetrConfig
LwDetrViTConfig
autodoc LwDetrViTConfig
LwDetrModel
autodoc LwDetrModel - forward
LwDetrForObjectDetection
autodoc LwDetrForObjectDetection - forward
LwDetrViTBackbone
autodoc LwDetrViTBackbone - forward