RT-DETR R18
PekingU/rtdetr_r18vd_coco_o365
published May 2024 · updated Jul 2024
RT-DETR R18 is a real-time object detection model that uses a Transformer-based architecture to eliminate NMS and achieve high speed and accuracy.
specs
| Task | Object Detection |
| Architecture | RT-DETR (Real-Time Detection Transformer) with efficient hybrid encoder and decoder |
| Parameters | 20M |
| License | Apache-2.0 |
about this model
PekingU/rtdetr_r18vd_coco_o365 is an object detection model that performs real-time, end-to-end detection by eliminating the need for non-maximum suppression (NMS). It is built on the RT-DETR architecture, which uses an efficient hybrid encoder to decouple intra-scale interaction and cross-scale fusion, and an uncertainty-minimal query selection mechanism to provide high-quality initial queries to the decoder. The model supports flexible speed tuning by adjusting the number of decoder layers without retraining.
Key Strengths
- End-to-end pipeline without NMS, reducing post-processing overhead.
- High inference speed: 217 FPS on a T4 GPU at batch size 1 (640x640 input).
- Competitive accuracy with lightweight design: 20 million parameters and 60.7 GFLOPs.
- Pre-trained on Objects365 and fine-tuned on COCO, improving generalization.
Benchmark Results
The model achieves the following performance on the COCO 2017 validation set (640x640 input):
| Model variant | Epochs | Params (M) | GFLOPs | FPS (bs=1) | AP | AP50 | AP75 | APₛ | APₘ | APₗ |
|---|---|---|---|---|---|---|---|---|---|---|
| RT-DETR-R18 (COCO only) | 72 | 20 | 60.7 | 217 | 46.5 | 63.8 | 50.4 | 28.4 | 49.8 | 63.0 |
| RT-DETR-R18 (Objects365 pretrained) | 60 | 20 | 61 | 217 | 49.2 | 66.6 | 53.5 | 33.2 | 52.3 | 64.8 |
| RT-DETR-R50 (Objects365 pretrained) | 24 | 42 | 136 | 108 | 55.3 | 73.4 | 60.1 | 37.9 | 59.9 | 71.8 |
| RT-DETR-R101 (Objects365 pretrained) | 24 | 76 | 259 | 74 | 56.2 | 74.6 | 61.3 | 38.3 | 60.5 | 73.5 |
Architecture Overview



Training Data and Procedure
The model was trained on the COCO 2017 object detection dataset (118k training images) and pre-trained on Objects365. Input images are resized to 640×640 with mean and standard deviation normalization (mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]). Full training details are available in the original paper.
best for
- ·Real-time object detection in video streams
- ·Edge deployment on resource-constrained devices
- ·High-speed inference for robotics or autonomous systems
FAQ
Images are resized to 640x640 pixels before inference.
It achieves 217 FPS with batch size 1 on a T4 GPU.
Apache-2.0 license.
Use the gigarouter OpenAI-compatible endpoint with your API key to send image inputs and receive detection results.
No, RT-DETR is an end-to-end detector that eliminates NMS entirely.
We're benchmarking and onboarding RT-DETR R18 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.