RT-DETR R50
PekingU/rtdetr_r50vd
published May 2024 · updated Feb 2025
RT-DETR R50 is a real-time end-to-end object detection model that eliminates NMS, achieving high speed and accuracy with a ResNet-50 backbone.
specs
| Task | Object Detection |
| Architecture | RT-DETR with ResNet-50 backbone and efficient hybrid encoder |
| Parameters | 42 million |
| Training Data | COCO 2017 (118k training images) |
| License | Apache-2.0 |
about this model
RT-DETR R50vd is a real-time end-to-end object detection model that uses a Transformer-based architecture to eliminate the need for Non-Maximum Suppression (NMS), achieving high accuracy with low latency. It was accepted to CVPR 2024.
Architecture
The model employs an efficient hybrid encoder that decouples intra-scale interaction (via attention) and cross-scale fusion (via CNN) to process multi-scale features from the backbone. Uncertainty-minimal query selection provides high-quality initial queries to the decoder, and the decoder can be adjusted (number of layers) to flexibly trade off speed and accuracy without retraining.
Performance
On COCO val2017, RT-DETR-R50 achieves 53.1% AP at 108 FPS on a T4 GPU (batch size 1), outperforming previous YOLO detectors in both speed and accuracy. It surpasses DINO-R50 by 2.2% AP and approximately 21× in FPS. With Objects365 pre-training, the same model reaches 55.3% AP.
Benchmark Results (COCO val2017)
| Model | #Epochs | Params (M) | GFLOPs | FPS (bs=1) | AP | AP50 | AP75 | AP-s | AP-m | AP-l |
|---|---|---|---|---|---|---|---|---|---|---|
| RT-DETR-R18 | 72 | 20 | 60.7 | 217 | 46.5 | 63.8 | 50.4 | 28.4 | 49.8 | 63.0 |
| RT-DETR-R34 | 72 | 31 | 91.0 | 172 | 48.5 | 66.2 | 52.3 | 30.2 | 51.9 | 66.2 |
| RT-DETR-R50 | 72 | 42 | 136 | 108 | 53.1 | 71.3 | 57.7 | 34.8 | 58.0 | 70.0 |
| RT-DETR-R101 | 72 | 76 | 259 | 74 | 54.3 | 72.7 | 58.6 | 36.0 | 58.8 | 72.1 |
| RT-DETR-R18 (Obj365) | 60 | 20 | 61 | 217 | 49.2 | 66.6 | 53.5 | 33.2 | 52.3 | 64.8 |
| RT-DETR-R50 (Obj365) | 24 | 42 | 136 | 108 | 55.3 | 73.4 | 60.1 | 37.9 | 59.9 | 71.8 |
| RT-DETR-R101 (Obj365) | 24 | 76 | 259 | 74 | 56.2 | 74.6 | 61.3 | 38.3 | 60.5 | 73.5 |

best for
- ·Real-time object detection in video streams or live camera feeds
- ·High-accuracy detection on edge devices with NVIDIA T4 GPUs
- ·Applications that benefit from end-to-end detection without NMS post-processing
FAQ
It excels at real-time object detection (108 FPS on T4) with high accuracy (53.1 AP on COCO), making it ideal for latency-sensitive applications like autonomous driving or surveillance.
RT-DETR R50 outperforms YOLOv8 in both speed and accuracy on COCO, and eliminates the need for NMS post-processing, simplifying the pipeline.
It accepts images (typically resized to 640x640) and returns bounding boxes, class labels, and confidence scores.
Use the gigarouter OpenAI-compatible endpoint with your API key; send an image as base64 or URL and receive detection results in JSON.
Yes, the model is released under the Apache-2.0 license, allowing commercial and research use with attribution.
We're benchmarking and onboarding RT-DETR R50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.