Deformable DETR
SenseTime/deformable-detr
published Mar 2022 · updated May 2024
Deformable DETR is a detection model that uses a deformable transformer attention mechanism for end-to-end object detection.
specs
| Task | Object Detection |
| Architecture | Encoder-decoder transformer with ResNet-50 backbone |
| Parameters | 34M |
| License | Apache 2.0 |
| Training Data | COCO 2017 (118k images) |
| Inference Speed | 27 FPS (single image) |
about this model
Deformable DETR (ResNet-50) is an object detection model that uses a deformable transformer attention mechanism to accelerate convergence and improve small-object detection relative to the original DETR. Trained end-to-end on COCO 2017 (118k annotated images), it employs an encoder-decoder transformer with a convolutional backbone, 100 learned object queries, and a bipartite matching loss (Hungarian algorithm) for optimal one-to-one assignment between predictions and ground truth.
Key mechanism
Instead of attending to all spatial locations, the deformable attention modules sample only a small set of key points around a reference, reducing complexity and enabling multi-scale feature aggregation without the high-resolution cost of DETR. This design allows the model to achieve better performance with 10× fewer training epochs (50 vs. 500).
Benchmark results (COCO val2017)
| Method | Epochs | AP | AP | Params (M) | FLOPs (G) | Train time (GPU hours) | Inference FPS |
|---|---|---|---|---|---|---|---|
| Deformable DETR (single scale) | 50 | 39.4 | 20.6 | 34 | 78 | 160 | 27.0 |
| DETR (DC5) | 500 | 43.3 | 22.5 | 41 | 187 | 2000 | 11.4 |
| Faster R-CNN + FPN | 109 | 42.0 | 26.0* | 42 | 180 | – | – |

Strengths
- Faster convergence: 50 epochs vs. 500 for DETR, reducing total training GPU hours from 2000 to 160.
- Comparable or better small-object AP (20.6) at a fraction of the training cost.
- Efficient inference: 27 FPS (single image) with 34M parameters and 78G FLOPs.
Deformable DETR is released under the Apache 2.0 license. The paper was presented as an ICLR 2021 Oral.
best for
- ·Real-time object detection in surveillance and autonomous driving
- ·Small object detection in aerial imagery or microscopy
- ·General-purpose object detection for photo analysis
FAQ
Object detection, especially small objects, with faster training and inference than original DETR.
It has 34 million parameters.
Apache 2.0 license.
Use the gigarouter OpenAI-compatible endpoint with an API key.
We're benchmarking and onboarding Deformable DETR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.