skip to content
gigarouter gigarouter
models / object detection · coming soon

Deformable DETR

SenseTime/deformable-detr

published Mar 2022 · updated May 2024

Deformable DETR is a detection model that uses a deformable transformer attention mechanism for end-to-end object detection.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
8.1K
license
apache-2.0

specs

TaskObject Detection
ArchitectureEncoder-decoder transformer with ResNet-50 backbone
Parameters34M
LicenseApache 2.0
Training DataCOCO 2017 (118k images)
Inference Speed27 FPS (single image)

about this model

Deformable DETR (ResNet-50) is an object detection model that uses a deformable transformer attention mechanism to accelerate convergence and improve small-object detection relative to the original DETR. Trained end-to-end on COCO 2017 (118k annotated images), it employs an encoder-decoder transformer with a convolutional backbone, 100 learned object queries, and a bipartite matching loss (Hungarian algorithm) for optimal one-to-one assignment between predictions and ground truth.

Key mechanism

Instead of attending to all spatial locations, the deformable attention modules sample only a small set of key points around a reference, reducing complexity and enabling multi-scale feature aggregation without the high-resolution cost of DETR. This design allows the model to achieve better performance with 10× fewer training epochs (50 vs. 500).

Benchmark results (COCO val2017)

MethodEpochsAPAPParams (M)FLOPs (G)Train time (GPU hours)Inference FPS
Deformable DETR (single scale)5039.420.6347816027.0
DETR (DC5)50043.322.541187200011.4
Faster R-CNN + FPN10942.026.0*42180

Architecture diagram of Deformable DETR showing encoder-decoder transformer with deformable attention sampling points

Strengths

  • Faster convergence: 50 epochs vs. 500 for DETR, reducing total training GPU hours from 2000 to 160.
  • Comparable or better small-object AP (20.6) at a fraction of the training cost.
  • Efficient inference: 27 FPS (single image) with 34M parameters and 78G FLOPs.

Deformable DETR is released under the Apache 2.0 license. The paper was presented as an ICLR 2021 Oral.

best for

FAQ

What is Deformable DETR best for?

Object detection, especially small objects, with faster training and inference than original DETR.

How many parameters does Deformable DETR have?

It has 34 million parameters.

What license is Deformable DETR released under?

Apache 2.0 license.

How can I use Deformable DETR via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with an API key.

not yet live

We're benchmarking and onboarding Deformable DETR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →