RT-DETRv2 R18
PekingU/rtdetr_v2_r18vd
published Jan 2025 · updated Feb 2025
RT-DETRv2 R18 is a real-time object detection transformer model optimized for fast and accurate detection in dynamic environments.
specs
| Task | Object Detection |
| Architecture | Detection Transformer (RT-DETRv2-S with ResNet-18 backbone) |
| Parameters | 20 million |
| License | Apache 2.0 |
| COCO AP | 48.1 |
| FPS (T4 TensorRT fp16) | 217 |
about this model
RT-DETRv2 r18vd is a real-time object detection transformer model that builds on the RT-DETR architecture with selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies including dynamic data augmentation and scale-adaptive hyperparameters. Designed for speed and accuracy, it achieves state-of-the-art real-time detection while maintaining flexibility across edge and cloud environments.
Key improvements
The decoder uses a distinct number of sampling points per feature scale in deformable attention, enabling selective multi-scale feature extraction. The optional discrete sampling operator replaces grid_sample, removing deployment constraints common to DETRs. Training enhancements (dynamic augmentation and scale-adaptive hyperparameters) improve performance without compromising inference speed. The model is pretrained on COCO train2017 and evaluated on COCO val2017.
Performance
On COCO val2017 at 640 input resolution, the r18vd variant (RT-DETRv2-S) achieves 48.1 AP and 65.1 AP50 with 20 million parameters and 217 FPS on a single T4 GPU (TensorRT fp16, batch size 1). This represents a +1.6 mAP improvement over the previous RT-DETR-R18 baseline. A discrete sampling variant (RT-DETRv2-S_dsp) scores 47.4 AP and 64.8 AP50, trading slight accuracy for wider deployment compatibility.
| Variant | AP | AP50 | Params | FPS (T4, fp16) |
|---|---|---|---|---|
| RT-DETRv2-S (r18vd) | 48.1 | 65.1 | 20M | 217 |
| RT-DETRv2-S_dsp | 47.4 | 64.8 | 20M | ~217 |
FPS measured on a single T4 GPU with TensorRT >= 8.5.1, fp16 precision, batch size 1. The model is released under the Apache 2.0 license.
best for
- ·Autonomous driving
- ·Surveillance systems
- ·Robotics
- ·Retail analytics
FAQ
It excels at real-time object detection in applications like autonomous driving, surveillance, robotics, and retail analytics.
It has 20 million parameters.
Apache 2.0 license.
The model expects images preprocessed with RTDetrImageProcessor (e.g., resized and normalized).
Use the OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image.
We're benchmarking and onboarding RT-DETRv2 R18 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.