RT-DETR R18

PekingU/rtdetr_r18vd_coco_o365

published May 2024 · updated Jul 2024

RT-DETR R18 is a real-time object detection model that uses a Transformer-based architecture to eliminate NMS and achieve high speed and accuracy.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

17.3K

license

apache-2.0

specs

Task	Object Detection
Architecture	RT-DETR (Real-Time Detection Transformer) with efficient hybrid encoder and decoder
Parameters	20M
License	Apache-2.0

about this model

PekingU/rtdetr_r18vd_coco_o365 is an object detection model that performs real-time, end-to-end detection by eliminating the need for non-maximum suppression (NMS). It is built on the RT-DETR architecture, which uses an efficient hybrid encoder to decouple intra-scale interaction and cross-scale fusion, and an uncertainty-minimal query selection mechanism to provide high-quality initial queries to the decoder. The model supports flexible speed tuning by adjusting the number of decoder layers without retraining.

Key Strengths

End-to-end pipeline without NMS, reducing post-processing overhead.
High inference speed: 217 FPS on a T4 GPU at batch size 1 (640x640 input).
Competitive accuracy with lightweight design: 20 million parameters and 60.7 GFLOPs.
Pre-trained on Objects365 and fine-tuned on COCO, improving generalization.

Benchmark Results

The model achieves the following performance on the COCO 2017 validation set (640x640 input):

Model variant	Epochs	Params (M)	GFLOPs	FPS (bs=1)	AP	AP50	AP75	APₛ	APₘ	APₗ
RT-DETR-R18 (COCO only)	72	20	60.7	217	46.5	63.8	50.4	28.4	49.8	63.0
RT-DETR-R18 (Objects365 pretrained)	60	20	61	217	49.2	66.6	53.5	33.2	52.3	64.8
RT-DETR-R50 (Objects365 pretrained)	24	42	136	108	55.3	73.4	60.1	37.9	59.9	71.8
RT-DETR-R101 (Objects365 pretrained)	24	76	259	74	56.2	74.6	61.3	38.3	60.5	73.5

Architecture Overview

Diagram of the RT-DETR architecture showing backbone, efficient hybrid encoder with AIFI and CCFF, uncertainty-minimal query selection, and decoder.

Comparison plot showing RT-DETR outperforming YOLO variants in speed vs. accuracy trade-off.

Table of training hyperparameters.

Training Data and Procedure

The model was trained on the COCO 2017 object detection dataset (118k training images) and pre-trained on Objects365. Input images are resized to 640×640 with mean and standard deviation normalization (mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]). Full training details are available in the original paper.

best for

·Real-time object detection in video streams
·Edge deployment on resource-constrained devices
·High-speed inference for robotics or autonomous systems

FAQ

What is the input size for this model?

Images are resized to 640x640 pixels before inference.

How fast is RT-DETR R18 on a T4 GPU?

It achieves 217 FPS with batch size 1 on a T4 GPU.

What license is this model released under?

Apache-2.0 license.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send image inputs and receive detection results.

Does this model require Non-Maximum Suppression (NMS)?

No, RT-DETR is an end-to-end detector that eliminates NMS entirely.

not yet live

We're benchmarking and onboarding RT-DETR R18 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo