RT-DETRv2 R101vd

PekingU/rtdetr_v2_r101vd

published Jan 2025 · updated Feb 2025

RT-DETRv2 R101vd is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction and improved training strategies for high accuracy and speed.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

6.9K

license

apache-2.0

specs

Task	Object Detection
Architecture	RT-DETRv2 with ResNet-101vd backbone
Parameters	76M (RT-DETRv2-X variant)
License	Apache 2.0

about this model

RT-DETRv2 (ResNet-101vd) is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies including dynamic data augmentation and scale-adaptive hyperparameters. These changes enhance flexibility and practicality while maintaining real-time performance.

Performance on COCO val2017

The RT-DETRv2 family, trained on COCO train2017, achieves the following results (AP averaged over IoU thresholds 0.50–0.95; AP50 at IoU 0.50; FPS measured on a single T4 GPU with fp16 and TensorRT≥8.5.1, batch size 1):

Variant	AP	AP50	Parameters	FPS
RT-DETRv2-S	48.1	65.1	20M	217
RT-DETRv2-M*	49.9	67.5	31M	161
RT-DETRv2-M	51.9	69.9	36M	145
RT-DETRv2-L	53.4	71.6	42M	108
RT-DETRv2-X	54.3	72.8	76M	74

Compared to the original RT-DETR, RT-DETRv2 variants show improvements of +0.3 to +1.6 mAP at the same real-time speeds.

Bar chart comparing speed and accuracy of RT-DETRv2 variants against other real-time detectors

This model uses the ResNet-101vd backbone. For deployments requiring a discrete sampling operator, the variant RT-DETRv2-S_dsp achieves 47.4 AP and 64.8 AP50.

The model is licensed under Apache 2.0. It was developed by researchers from Baidu Inc. and Peking University Shenzhen Graduate School.

best for

·Autonomous driving
·Surveillance systems
·Robotics
·Retail analytics

FAQ

What is the input format for this model?

The model accepts images processed via RTDetrImageProcessor, returning tensors for object detection.

What is the output format?

The model outputs bounding boxes, labels, and confidence scores for detected objects.

What is the license for this model?

The model is released under the Apache 2.0 license.

How does RT-DETRv2 R101vd compare to other variants in speed?

The X variant (76M params) achieves 74 FPS on a single T4 GPU with TensorRT.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key to send image inputs and receive detection results.

not yet live

We're benchmarking and onboarding RT-DETRv2 R101vd as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo