RT-DETRv2 R101vd
PekingU/rtdetr_v2_r101vd
published Jan 2025 · updated Feb 2025
RT-DETRv2 R101vd is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction and improved training strategies for high accuracy and speed.
specs
| Task | Object Detection |
| Architecture | RT-DETRv2 with ResNet-101vd backbone |
| Parameters | 76M (RT-DETRv2-X variant) |
| License | Apache 2.0 |
about this model
RT-DETRv2 (ResNet-101vd) is a real-time object detection model that refines the RT-DETR architecture with selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies including dynamic data augmentation and scale-adaptive hyperparameters. These changes enhance flexibility and practicality while maintaining real-time performance.
Performance on COCO val2017
The RT-DETRv2 family, trained on COCO train2017, achieves the following results (AP averaged over IoU thresholds 0.50–0.95; AP50 at IoU 0.50; FPS measured on a single T4 GPU with fp16 and TensorRT≥8.5.1, batch size 1):
| Variant | AP | AP50 | Parameters | FPS |
|---|---|---|---|---|
| RT-DETRv2-S | 48.1 | 65.1 | 20M | 217 |
| RT-DETRv2-M* | 49.9 | 67.5 | 31M | 161 |
| RT-DETRv2-M | 51.9 | 69.9 | 36M | 145 |
| RT-DETRv2-L | 53.4 | 71.6 | 42M | 108 |
| RT-DETRv2-X | 54.3 | 72.8 | 76M | 74 |
Compared to the original RT-DETR, RT-DETRv2 variants show improvements of +0.3 to +1.6 mAP at the same real-time speeds.
This model uses the ResNet-101vd backbone. For deployments requiring a discrete sampling operator, the variant RT-DETRv2-S_dsp achieves 47.4 AP and 64.8 AP50.
The model is licensed under Apache 2.0. It was developed by researchers from Baidu Inc. and Peking University Shenzhen Graduate School.
best for
- ·Autonomous driving
- ·Surveillance systems
- ·Robotics
- ·Retail analytics
FAQ
The model accepts images processed via RTDetrImageProcessor, returning tensors for object detection.
The model outputs bounding boxes, labels, and confidence scores for detected objects.
The model is released under the Apache 2.0 license.
The X variant (76M params) achieves 74 FPS on a single T4 GPU with TensorRT.
Use the gigarouter OpenAI-compatible endpoint with an API key to send image inputs and receive detection results.
We're benchmarking and onboarding RT-DETRv2 R101vd as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.