Conditional DETR ResNet-50
microsoft/conditional-detr-resnet-50
published Sep 2022 · updated May 2024
Conditional DETR ResNet-50 is a detection model that uses a conditional cross-attention mechanism for fast training convergence, trained on COCO 2017 for object detection.
specs
| Task | Object Detection |
| Architecture | Conditional DETR with ResNet-50 backbone (transformer encoder-decoder) |
| Parameters | 44M |
| License | Apache 2.0 |
| Training Data | COCO 2017 (118k annotated images) |
about this model
Conditional DETR with ResNet-50 backbone is an object detection transformer model that uses a conditional cross-attention mechanism to achieve fast training convergence while maintaining high accuracy. It is trained end-to-end on COCO 2017 (118k annotated images) and addresses the slow convergence of the original DETR by learning a conditional spatial query from the decoder embedding. This spatial query allows each cross-attention head to attend to a distinct region around the object, reducing dependence on content embeddings and easing optimization.
Key Strengths
- Converges 6.7× faster than DETR-R50: reaches competitive performance in 50 epochs instead of 500.
- Built on the standard ResNet-50 backbone for broad compatibility and efficient inference.
- Licensed under Apache 2.0.
Benchmark Results (COCO 2017 val)
The model achieves 41.0 AP after 50 epochs of training with 44M parameters and 90G FLOPs, versus DETR-R50 at 50 epochs (34.8 AP) and DETR-R50 at 500 epochs (42.0 AP).
| Method | Epochs | Params (M) | FLOPs (G) | AP | AP | AP | AP |
|---|---|---|---|---|---|---|---|
| Conditional DETR-R50 | 50 | 44 | 90 | 41.0 | 20.6 | 44.3 | 59.3 |
| DETR-R50 | 50 | 41 | 86 | 34.8 | 13.9 | 37.3 | 54.4 |
| DETR-R50 | 500 | 41 | 86 | 42.0 | 20.5 | 45.8 | 61.1 |

The model is hosted by gigarouter as a managed, OpenAI-compatible API. Developers can deploy it for general-purpose object detection without handling model installation or inference infrastructure.
best for
- ·Real-time object detection in images and video
- ·Detection of common objects (COCO categories) in photographs
- ·Transfer learning for custom object detection tasks
FAQ
Object detection with faster training convergence compared to original DETR, achieving comparable accuracy in fewer epochs.
44 million parameters.
Apache 2.0.
Use the OpenAI-compatible endpoint with an API key; send an image as input and receive bounding boxes and class labels in JSON format.
Input: image URL or base64-encoded image. Output: JSON with detected objects, confidence scores, and bounding box coordinates.
We're benchmarking and onboarding Conditional DETR ResNet-50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.