RT-DETR R50

PekingU/rtdetr_r50vd

published May 2024 · updated Feb 2025

RT-DETR R50 is a real-time end-to-end object detection model that eliminates NMS, achieving high speed and accuracy with a ResNet-50 backbone.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

63.7K

license

apache-2.0

specs

Task	Object Detection
Architecture	RT-DETR with ResNet-50 backbone and efficient hybrid encoder
Parameters	42 million
Training Data	COCO 2017 (118k training images)
License	Apache-2.0

about this model

RT-DETR R50vd is a real-time end-to-end object detection model that uses a Transformer-based architecture to eliminate the need for Non-Maximum Suppression (NMS), achieving high accuracy with low latency. It was accepted to CVPR 2024.

Architecture

The model employs an efficient hybrid encoder that decouples intra-scale interaction (via attention) and cross-scale fusion (via CNN) to process multi-scale features from the backbone. Uncertainty-minimal query selection provides high-quality initial queries to the decoder, and the decoder can be adjusted (number of layers) to flexibly trade off speed and accuracy without retraining.

Performance

On COCO val2017, RT-DETR-R50 achieves 53.1% AP at 108 FPS on a T4 GPU (batch size 1), outperforming previous YOLO detectors in both speed and accuracy. It surpasses DINO-R50 by 2.2% AP and approximately 21× in FPS. With Objects365 pre-training, the same model reaches 55.3% AP.

Benchmark Results (COCO val2017)

Model	#Epochs	Params (M)	GFLOPs	FPS (bs=1)	AP	AP50	AP75	AP-s	AP-m	AP-l
RT-DETR-R18	72	20	60.7	217	46.5	63.8	50.4	28.4	49.8	63.0
RT-DETR-R34	72	31	91.0	172	48.5	66.2	52.3	30.2	51.9	66.2
RT-DETR-R50	72	42	136	108	53.1	71.3	57.7	34.8	58.0	70.0
RT-DETR-R101	72	76	259	74	54.3	72.7	58.6	36.0	58.8	72.1
RT-DETR-R18 (Obj365)	60	20	61	217	49.2	66.6	53.5	33.2	52.3	64.8
RT-DETR-R50 (Obj365)	24	42	136	108	55.3	73.4	60.1	37.9	59.9	71.8
RT-DETR-R101 (Obj365)	24	76	259	74	56.2	74.6	61.3	38.3	60.5	73.5

Training hyperparameters table showing batch size, learning rate schedule, and data augmentation details

Comparison plot of RT-DETR versus YOLO and DETR models on speed versus accuracy trade-off

best for

·Real-time object detection in video streams or live camera feeds
·High-accuracy detection on edge devices with NVIDIA T4 GPUs
·Applications that benefit from end-to-end detection without NMS post-processing

FAQ

What is RT-DETR R50 best used for?

It excels at real-time object detection (108 FPS on T4) with high accuracy (53.1 AP on COCO), making it ideal for latency-sensitive applications like autonomous driving or surveillance.

How does RT-DETR R50 compare to YOLO detectors?

RT-DETR R50 outperforms YOLOv8 in both speed and accuracy on COCO, and eliminates the need for NMS post-processing, simplifying the pipeline.

What are the input and output formats?

It accepts images (typically resized to 640x640) and returns bounding boxes, class labels, and confidence scores.

How can I call RT-DETR R50 via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; send an image as base64 or URL and receive detection results in JSON.

Is RT-DETR R50 free to use?

Yes, the model is released under the Apache-2.0 license, allowing commercial and research use with attribution.

not yet live

We're benchmarking and onboarding RT-DETR R50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo