skip to content
gigarouter gigarouter
models / object detection · coming soon

RT-DETR (R18)

PekingU/rtdetr_r18vd

published May 2024 · updated Jul 2024

RT-DETR (R18) is a real-time end-to-end object detection model that eliminates the need for NMS post-processing.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
9K
license
apache-2.0

specs

TaskObject Detection
ArchitectureRT-DETR with ResNet-18 backbone
Parameters20M
LicenseApache-2.0

about this model

RTDetrR18vd is a real-time end-to-end object detection model that eliminates the need for non-maximum suppression (NMS), addressing a key limitation of YOLO-based detectors. It is the first real-time end-to-end Transformer-based detector, built on the RT-DETR architecture with an efficient hybrid encoder that decouples intra-scale interaction and cross-scale fusion for speed, combined with uncertainty-minimal query selection for accuracy.

Key Strengths

The model achieves a strong balance of speed and accuracy. On the COCO 2017 validation set, RT-DETR-R18vd reaches 46.5 AP (63.8 AP50, 50.4 AP75) at 217 FPS on a T4 GPU with batch size 1, using only 20M parameters and 60.7 GFLOPs. When pretrained on Objects365, performance improves to 49.2 AP at the same 217 FPS. The architecture supports flexible speed tuning by adjusting decoder layers without retraining.

Benchmark Results (COCO val2017)

ModelParams (M)GFLOPsFPS (bs=1)APAP50AP75
RT-DETR-R182060.721746.563.850.4
RT-DETR-R343191.017248.566.252.3
RT-DETR-R504213610853.171.357.7
RT-DETR-R101762597454.372.758.6

With Objects365 pretraining, RT-DETR-R18 reaches 49.2 AP, and larger variants achieve up to 56.2 AP (R101). The original paper was accepted to CVPR 2024.

Architecture Overview

RT-DETR architecture diagram showing backbone, efficient hybrid encoder with AIFI and CCFF modules, uncertainty-minimal query selection, and decoder The model processes multi-scale features from the last three backbone stages through an Attention-based Intra-scale Feature Interaction (AIFI) and CNN-based Cross-scale Feature Fusion (CCFF) encoder, then selects high-quality initial queries for the decoder.

Speed-accuracy comparison chart showing RT-DETR outperforming YOLO variants on T4 GPU RT-DETR-R18vd is trained on COCO 2017 (118k training images) and is available under the Apache-2.0 license.

best for

FAQ

What is the input and output format for this model?

The model accepts images resized to 640x640 pixels and outputs bounding boxes, class labels, and confidence scores for detected objects.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. Send an image URL or base64-encoded image and receive detection results in JSON format.

What is the license for RT-DETR (R18)?

The model is released under the Apache-2.0 license.

How does RT-DETR (R18) compare to YOLO in speed and accuracy?

RT-DETR (R18) achieves 46.5 AP on COCO at 217 FPS on a T4 GPU, outperforming previous YOLO detectors in both speed and accuracy while eliminating NMS.

What dataset was RT-DETR (R18) trained on?

The model was trained on the COCO 2017 object detection dataset (118k training images).

not yet live

We're benchmarking and onboarding RT-DETR (R18) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →