skip to content
gigarouter gigarouter
models / object detection · coming soon

RT-DETR R101

PekingU/rtdetr_r101vd_coco_o365

published Jun 2024 · updated Jul 2024

RT-DETR R101 is a real-time end-to-end object detection model using a ResNet-101-vd backbone and Transformer encoder-decoder architecture.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
99.4K
license
apache-2.0

specs

TaskObject Detection
ArchitectureRT-DETR with ResNet-101-vd backbone and Transformer encoder-decoder
Parameters76 million
LicenseApache 2.0
Input Size640x640 pixels

about this model

RT-DETR (Real-Time Detection Transformer) is an object detection model that eliminates the need for non-maximum suppression (NMS) by using an end-to-end Transformer architecture, achieving real-time inference speeds competitive with YOLO while delivering higher accuracy.

Architecture and Key Strengths

The model uses an efficient hybrid encoder that decouples intra-scale interaction and cross-scale fusion to process multi-scale features quickly. An uncertainty-minimal query selection provides high-quality initial queries to the decoder, boosting accuracy. RT-DETR supports flexible speed tuning by adjusting the number of decoder layers without retraining, adapting to different latency requirements.

On a T4 GPU, RT-DETR-R101 reaches 74 FPS (batch size 1) and 54.3% AP on COCO val2017. After pre-training on Objects365, the same model achieves 56.2% AP. The work was accepted to CVPR 2024.

Benchmark Results (COCO val2017 with Objects365 pre-training)

Model#EpochsParams (M)GFLOPsFPS (bs=1)APAP50AP75AP-sAP-mAP-l
RT-DETR-R18 (O365)60206121749.266.653.533.252.364.8
RT-DETR-R50 (O365)244213610855.373.460.137.959.971.8
RT-DETR-R101 (O365)24762597456.274.661.338.360.573.5

Model Overview and Architecture

Comparison of RT-DETR with other real-time detectors showing speed-accuracy trade-off

Training hyperparameters table for RT-DETR

Architecture diagram of RT-DETR showing backbone, efficient hybrid encoder with AIFI and CCFF, uncertainty-minimal query selection, and decoder

best for

FAQ

What input format does the model expect?

It expects an image resized to 640x640, normalized with mean (0.485, 0.456, 0.406) and std (0.229, 0.224, 0.225), typically as a tensor.

What does the model output?

It outputs bounding boxes, class labels, and confidence scores for detected objects (COCO classes).

How does RT-DETR compare to YOLO in speed and accuracy?

RT-DETR eliminates NMS, achieving higher accuracy (56.2% AP vs YOLO) at competitive real-time speeds (74 FPS on T4 with R101).

What is the license for this model?

Apache 2.0, allowing free use, modification, and distribution.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending the image as a base64-encoded string in the request.

not yet live

We're benchmarking and onboarding RT-DETR R101 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →