skip to content
gigarouter gigarouter
models / object detection · coming soon

RT-DETR R18

PekingU/rtdetr_r18vd_coco_o365

published May 2024 · updated Jul 2024

RT-DETR R18 is a real-time object detection model that uses a Transformer-based architecture to eliminate NMS and achieve high speed and accuracy.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
17.3K
license
apache-2.0

specs

TaskObject Detection
ArchitectureRT-DETR (Real-Time Detection Transformer) with efficient hybrid encoder and decoder
Parameters20M
LicenseApache-2.0

about this model

PekingU/rtdetr_r18vd_coco_o365 is an object detection model that performs real-time, end-to-end detection by eliminating the need for non-maximum suppression (NMS). It is built on the RT-DETR architecture, which uses an efficient hybrid encoder to decouple intra-scale interaction and cross-scale fusion, and an uncertainty-minimal query selection mechanism to provide high-quality initial queries to the decoder. The model supports flexible speed tuning by adjusting the number of decoder layers without retraining.

Key Strengths

  • End-to-end pipeline without NMS, reducing post-processing overhead.
  • High inference speed: 217 FPS on a T4 GPU at batch size 1 (640x640 input).
  • Competitive accuracy with lightweight design: 20 million parameters and 60.7 GFLOPs.
  • Pre-trained on Objects365 and fine-tuned on COCO, improving generalization.

Benchmark Results

The model achieves the following performance on the COCO 2017 validation set (640x640 input):

Model variantEpochsParams (M)GFLOPsFPS (bs=1)APAP50AP75APₛAPₘAPₗ
RT-DETR-R18 (COCO only)722060.721746.563.850.428.449.863.0
RT-DETR-R18 (Objects365 pretrained)60206121749.266.653.533.252.364.8
RT-DETR-R50 (Objects365 pretrained)244213610855.373.460.137.959.971.8
RT-DETR-R101 (Objects365 pretrained)24762597456.274.661.338.360.573.5

Architecture Overview

Diagram of the RT-DETR architecture showing backbone, efficient hybrid encoder with AIFI and CCFF, uncertainty-minimal query selection, and decoder.

Comparison plot showing RT-DETR outperforming YOLO variants in speed vs. accuracy trade-off.

Table of training hyperparameters.

Training Data and Procedure

The model was trained on the COCO 2017 object detection dataset (118k training images) and pre-trained on Objects365. Input images are resized to 640×640 with mean and standard deviation normalization (mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]). Full training details are available in the original paper.

best for

FAQ

What is the input size for this model?

Images are resized to 640x640 pixels before inference.

How fast is RT-DETR R18 on a T4 GPU?

It achieves 217 FPS with batch size 1 on a T4 GPU.

What license is this model released under?

Apache-2.0 license.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send image inputs and receive detection results.

Does this model require Non-Maximum Suppression (NMS)?

No, RT-DETR is an end-to-end detector that eliminates NMS entirely.

not yet live

We're benchmarking and onboarding RT-DETR R18 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →