skip to content
gigarouter gigarouter
models / object detection · coming soon

Conditional DETR ResNet-50

microsoft/conditional-detr-resnet-50

published Sep 2022 · updated May 2024

Conditional DETR ResNet-50 is a detection model that uses a conditional cross-attention mechanism for fast training convergence, trained on COCO 2017 for object detection.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
18.5K
license
apache-2.0

specs

TaskObject Detection
ArchitectureConditional DETR with ResNet-50 backbone (transformer encoder-decoder)
Parameters44M
LicenseApache 2.0
Training DataCOCO 2017 (118k annotated images)

about this model

Conditional DETR with ResNet-50 backbone is an object detection transformer model that uses a conditional cross-attention mechanism to achieve fast training convergence while maintaining high accuracy. It is trained end-to-end on COCO 2017 (118k annotated images) and addresses the slow convergence of the original DETR by learning a conditional spatial query from the decoder embedding. This spatial query allows each cross-attention head to attend to a distinct region around the object, reducing dependence on content embeddings and easing optimization.

Key Strengths

  • Converges 6.7× faster than DETR-R50: reaches competitive performance in 50 epochs instead of 500.
  • Built on the standard ResNet-50 backbone for broad compatibility and efficient inference.
  • Licensed under Apache 2.0.

Benchmark Results (COCO 2017 val)

The model achieves 41.0 AP after 50 epochs of training with 44M parameters and 90G FLOPs, versus DETR-R50 at 50 epochs (34.8 AP) and DETR-R50 at 500 epochs (42.0 AP).

MethodEpochsParams (M)FLOPs (G)APAPAPAP
Conditional DETR-R5050449041.020.644.359.3
DETR-R5050418634.813.937.354.4
DETR-R50500418642.020.545.861.1

Architecture diagram of Conditional DETR showing conditional spatial query integration

The model is hosted by gigarouter as a managed, OpenAI-compatible API. Developers can deploy it for general-purpose object detection without handling model installation or inference infrastructure.

best for

FAQ

What is Conditional DETR best for?

Object detection with faster training convergence compared to original DETR, achieving comparable accuracy in fewer epochs.

How many parameters does the model have?

44 million parameters.

What is the license for Conditional DETR ResNet-50?

Apache 2.0.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with an API key; send an image as input and receive bounding boxes and class labels in JSON format.

What input and output formats does the API support?

Input: image URL or base64-encoded image. Output: JSON with detected objects, confidence scores, and bounding box coordinates.

not yet live

We're benchmarking and onboarding Conditional DETR ResNet-50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →