RT-DETRv2 R18

PekingU/rtdetr_v2_r18vd

published Jan 2025 · updated Feb 2025

RT-DETRv2 R18 is a real-time object detection transformer model optimized for fast and accurate detection in dynamic environments.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

97.1K

license

apache-2.0

specs

Task	Object Detection
Architecture	Detection Transformer (RT-DETRv2-S with ResNet-18 backbone)
Parameters	20 million
License	Apache 2.0
COCO AP	48.1
FPS (T4 TensorRT fp16)	217

about this model

RT-DETRv2 r18vd is a real-time object detection transformer model that builds on the RT-DETR architecture with selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies including dynamic data augmentation and scale-adaptive hyperparameters. Designed for speed and accuracy, it achieves state-of-the-art real-time detection while maintaining flexibility across edge and cloud environments.

Key improvements

The decoder uses a distinct number of sampling points per feature scale in deformable attention, enabling selective multi-scale feature extraction. The optional discrete sampling operator replaces grid_sample, removing deployment constraints common to DETRs. Training enhancements (dynamic augmentation and scale-adaptive hyperparameters) improve performance without compromising inference speed. The model is pretrained on COCO train2017 and evaluated on COCO val2017.

Performance

On COCO val2017 at 640 input resolution, the r18vd variant (RT-DETRv2-S) achieves 48.1 AP and 65.1 AP50 with 20 million parameters and 217 FPS on a single T4 GPU (TensorRT fp16, batch size 1). This represents a +1.6 mAP improvement over the previous RT-DETR-R18 baseline. A discrete sampling variant (RT-DETRv2-S_dsp) scores 47.4 AP and 64.8 AP50, trading slight accuracy for wider deployment compatibility.

Variant	AP	AP50	Params	FPS (T4, fp16)
RT-DETRv2-S (r18vd)	48.1	65.1	20M	217
RT-DETRv2-S_dsp	47.4	64.8	20M	~217

FPS measured on a single T4 GPU with TensorRT >= 8.5.1, fp16 precision, batch size 1. The model is released under the Apache 2.0 license.

best for

·Autonomous driving
·Surveillance systems
·Robotics
·Retail analytics

FAQ

What is RT-DETRv2 R18 best used for?

It excels at real-time object detection in applications like autonomous driving, surveillance, robotics, and retail analytics.

How many parameters does this model have?

It has 20 million parameters.

What license is the model released under?

Apache 2.0 license.

What input format does the model require?

The model expects images preprocessed with RTDetrImageProcessor (e.g., resized and normalized).

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image.

not yet live

We're benchmarking and onboarding RT-DETRv2 R18 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo