DETR ResNet-101 DC5
facebook/detr-resnet-101-dc5
published Mar 2022 · updated Sep 2023
DETR ResNet-101 DC5 is an end-to-end object detection model that uses a Transformer encoder-decoder with a dilated ResNet-101 backbone to directly predict bounding boxes and class labels from images.
specs
| Task | Object Detection |
| Architecture | Transformer encoder-decoder with ResNet-101 backbone (dilated C5 stage) |
| Parameters | ~60 million (estimated) |
| License | Apache 2.0 |
about this model
facebook/detr-resnet-101-dc5 is an object detection model that uses a Transformer encoder-decoder architecture with a dilated ResNet-101 backbone, trained end-to-end on COCO 2017 (118k images). It formulates detection as a direct set prediction problem: a fixed set of 100 learned object queries is matched to ground-truth objects via bipartite matching (Hungarian algorithm), and the model outputs class labels and bounding boxes in parallel.
Key strengths
- Simplified pipeline: no hand-crafted post-processing (NMS) or region proposal network.
- Strong performance on COCO val5k: average precision (AP) of 44.9, with AP = 64.7, AP = 47.7, AP = 23.8, AP = 49.0, and AP = 64.2.
- Inference speed: 0.097 seconds per image (first 100 COCO val5k images with TorchScript transformer). Model checkpoint size is 232 MB.
- Trained for 500 epochs with learning rate drop at epoch 400 on 8 nodes (batch size 1 per GPU with dilation).
Comparison to other DETR variants (COCO val5k box AP)
| Model | Backbone | Schedule | Inference time (s) | AP |
|---|---|---|---|---|
| DETR | R50 | 500 | 0.036 | 42.0 |
| DETR-DC5 | R50 | 500 | 0.083 | 43.3 |
| DETR | R101 | 500 | 0.050 | 43.5 |
| DETR-DC5 | R101 | 500 | 0.097 | 44.9 |
Additional metrics (COCO val5k)
- Average recall: AR@1 = 0.354, AR@10 = 0.571, AR@100 = 0.614
- AR by size: small = 0.363, medium = 0.667, large = 0.834
best for
- ·Detecting common objects in natural images using COCO classes
- ·Applications requiring a simple, end-to-end detection pipeline without hand-crafted components
FAQ
The model expects an image resized so the shortest side is at least 800 pixels and the longest side at most 1333 pixels, normalized with ImageNet mean and std. Use the gigarouter OpenAI-compatible endpoint with an API key.
It outputs 100 object queries, each with a class label (from COCO) and a bounding box, plus a "no object" class for empty queries.
It achieves 44.9 AP on COCO val5k with an inference time of 0.097 seconds per image, outperforming DETR R50 (42.0 AP, 0.036s) and DETR R101 (43.5 AP, 0.050s).
The model is released under the Apache 2.0 license.
Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image URL or base64-encoded image.
We're benchmarking and onboarding DETR ResNet-101 DC5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.