skip to content
gigarouter gigarouter
models / object detection · coming soon

DETR ResNet-101 DC5

facebook/detr-resnet-101-dc5

published Mar 2022 · updated Sep 2023

DETR ResNet-101 DC5 is an end-to-end object detection model that uses a Transformer encoder-decoder with a dilated ResNet-101 backbone to directly predict bounding boxes and class labels from images.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
7.1K
license
apache-2.0

specs

TaskObject Detection
ArchitectureTransformer encoder-decoder with ResNet-101 backbone (dilated C5 stage)
Parameters~60 million (estimated)
LicenseApache 2.0

about this model

facebook/detr-resnet-101-dc5 is an object detection model that uses a Transformer encoder-decoder architecture with a dilated ResNet-101 backbone, trained end-to-end on COCO 2017 (118k images). It formulates detection as a direct set prediction problem: a fixed set of 100 learned object queries is matched to ground-truth objects via bipartite matching (Hungarian algorithm), and the model outputs class labels and bounding boxes in parallel.

Key strengths

  • Simplified pipeline: no hand-crafted post-processing (NMS) or region proposal network.
  • Strong performance on COCO val5k: average precision (AP) of 44.9, with AP = 64.7, AP = 47.7, AP = 23.8, AP = 49.0, and AP = 64.2.
  • Inference speed: 0.097 seconds per image (first 100 COCO val5k images with TorchScript transformer). Model checkpoint size is 232 MB.
  • Trained for 500 epochs with learning rate drop at epoch 400 on 8 nodes (batch size 1 per GPU with dilation).

Comparison to other DETR variants (COCO val5k box AP)

ModelBackboneScheduleInference time (s)AP
DETRR505000.03642.0
DETR-DC5R505000.08343.3
DETRR1015000.05043.5
DETR-DC5R1015000.09744.9

Additional metrics (COCO val5k)

  • Average recall: AR@1 = 0.354, AR@10 = 0.571, AR@100 = 0.614
  • AR by size: small = 0.363, medium = 0.667, large = 0.834

best for

FAQ

What is the input format for this model?

The model expects an image resized so the shortest side is at least 800 pixels and the longest side at most 1333 pixels, normalized with ImageNet mean and std. Use the gigarouter OpenAI-compatible endpoint with an API key.

What output does the model produce?

It outputs 100 object queries, each with a class label (from COCO) and a bounding box, plus a "no object" class for empty queries.

How does this model compare to other DETR variants in speed and accuracy?

It achieves 44.9 AP on COCO val5k with an inference time of 0.097 seconds per image, outperforming DETR R50 (42.0 AP, 0.036s) and DETR R101 (43.5 AP, 0.050s).

What is the license for this model?

The model is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image URL or base64-encoded image.

not yet live

We're benchmarking and onboarding DETR ResNet-101 DC5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →