Deformable DETR

SenseTime/deformable-detr

published Mar 2022 · updated May 2024

Deformable DETR is a detection model that uses a deformable transformer attention mechanism for end-to-end object detection.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

8.1K

license

apache-2.0

specs

Task	Object Detection
Architecture	Encoder-decoder transformer with ResNet-50 backbone
Parameters	34M
License	Apache 2.0
Training Data	COCO 2017 (118k images)
Inference Speed	27 FPS (single image)

about this model

Deformable DETR (ResNet-50) is an object detection model that uses a deformable transformer attention mechanism to accelerate convergence and improve small-object detection relative to the original DETR. Trained end-to-end on COCO 2017 (118k annotated images), it employs an encoder-decoder transformer with a convolutional backbone, 100 learned object queries, and a bipartite matching loss (Hungarian algorithm) for optimal one-to-one assignment between predictions and ground truth.

Key mechanism

Instead of attending to all spatial locations, the deformable attention modules sample only a small set of key points around a reference, reducing complexity and enabling multi-scale feature aggregation without the high-resolution cost of DETR. This design allows the model to achieve better performance with 10× fewer training epochs (50 vs. 500).

Benchmark results (COCO val2017)

Method	Epochs	AP	AP	Params (M)	FLOPs (G)	Train time (GPU hours)	Inference FPS
Deformable DETR (single scale)	50	39.4	20.6	34	78	160	27.0
DETR (DC5)	500	43.3	22.5	41	187	2000	11.4
Faster R-CNN + FPN	109	42.0	26.0*	42	180	–	–

Architecture diagram of Deformable DETR showing encoder-decoder transformer with deformable attention sampling points

Strengths

Faster convergence: 50 epochs vs. 500 for DETR, reducing total training GPU hours from 2000 to 160.
Comparable or better small-object AP (20.6) at a fraction of the training cost.
Efficient inference: 27 FPS (single image) with 34M parameters and 78G FLOPs.

Deformable DETR is released under the Apache 2.0 license. The paper was presented as an ICLR 2021 Oral.

best for

·Real-time object detection in surveillance and autonomous driving
·Small object detection in aerial imagery or microscopy
·General-purpose object detection for photo analysis

FAQ

What is Deformable DETR best for?

Object detection, especially small objects, with faster training and inference than original DETR.

How many parameters does Deformable DETR have?

It has 34 million parameters.

What license is Deformable DETR released under?

Apache 2.0 license.

How can I use Deformable DETR via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with an API key.

not yet live

We're benchmarking and onboarding Deformable DETR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo