skip to content
gigarouter gigarouter
models / object detection · coming soon

RT-DETRv2 R101vd

PekingU/rtdetr_v2_r101vd

published Jan 2025 · updated Feb 2025

RT-DETRv2 R101vd is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction and improved training strategies for high accuracy and speed.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
6.9K
license
apache-2.0

specs

TaskObject Detection
ArchitectureRT-DETRv2 with ResNet-101vd backbone
Parameters76M (RT-DETRv2-X variant)
LicenseApache 2.0

about this model

RT-DETRv2 (ResNet-101vd) is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies including dynamic data augmentation and scale-adaptive hyperparameters. These changes enhance flexibility and practicality while maintaining real-time performance.

Performance on COCO val2017

The RT-DETRv2 family, trained on COCO train2017, achieves the following results (AP averaged over IoU thresholds 0.50–0.95; AP50 at IoU 0.50; FPS measured on a single T4 GPU with fp16 and TensorRT≥8.5.1, batch size 1):

Variant AP AP50 Parameters FPS
RT-DETRv2-S 48.1 65.1 20M 217
RT-DETRv2-M* 49.9 67.5 31M 161
RT-DETRv2-M 51.9 69.9 36M 145
RT-DETRv2-L 53.4 71.6 42M 108
RT-DETRv2-X 54.3 72.8 76M 74

Compared to the original RT-DETR, RT-DETRv2 variants show improvements of +0.3 to +1.6 mAP at the same real-time speeds.

Bar chart comparing speed and accuracy of RT-DETRv2 variants against other real-time detectors

This model uses the ResNet-101vd backbone. For deployments requiring a discrete sampling operator, the variant RT-DETRv2-S_dsp achieves 47.4 AP and 64.8 AP50.

The model is licensed under Apache 2.0. It was developed by researchers from Baidu Inc. and Peking University Shenzhen Graduate School.

best for

FAQ

What is the input format for this model?

The model accepts images processed via RTDetrImageProcessor, returning tensors for object detection.

What is the output format?

The model outputs bounding boxes, labels, and confidence scores for detected objects.

What is the license for this model?

The model is released under the Apache 2.0 license.

How does RT-DETRv2 R101vd compare to other variants in speed?

The X variant (76M params) achieves 74 FPS on a single T4 GPU with TensorRT.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key to send image inputs and receive detection results.

not yet live

We're benchmarking and onboarding RT-DETRv2 R101vd as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →