D-FINE Small (COCO)

ustc-community/dfine-small-coco

published Feb 2025 · updated May 2025

D-FINE Small (COCO) is a detection model that redefines bounding box regression as fine-grained distribution refinement for real-time object detection.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

4.5K

license

apache-2.0

specs

Task	Object Detection
Architecture	D-FINE (Transformer-based DETR)
Training Dataset	COCO 2017
Input	Images
Output	Bounding boxes, scores, and class labels
Framework	Hugging Face Transformers (PyTorch)

about this model

ustc-community/dfine-small-coco is a real-time object detection model that redefines bounding box regression using Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). Developed as part of the D-FINE family, it achieves high localization precision with minimal added parameters, making it suitable for latency-sensitive applications.

Key Strengths

Enhanced bounding box regression via fine-grained distribution refinement, improving localization accuracy over traditional DETR approaches.
Global optimal localization self-distillation boosts performance without extra inference cost.
The method improves a range of DETR models by up to 5.3% AP with negligible additional parameters and training overhead.

Benchmark Results

Evaluated on COCO val2017, the D-FINE family delivers competitive speed and accuracy on an NVIDIA T4 GPU (TensorRT FP16):

Model Variant	AP	FPS
D-FINE-L	54.0%	124
D-FINE-X	55.8%	78
D-FINE-L (Objects365 pretraining)	57.1%	124
D-FINE-X (Objects365 pretraining)	59.3%	78

The dfine-small-coco variant is trained on COCO train2017. Its performance aligns with the D-FINE family’s trade-off between accuracy and speed.

COCO benchmark comparison of D-FINE variants against other real-time detectors

This model is hosted as a managed, OpenAI-compatible API on gigarouter, enabling direct integration without infrastructure overhead.

best for

·Autonomous driving
·Surveillance systems
·Robotics
·Retail analytics

FAQ

What is D-FINE Small (COCO) best for?

Real-time object detection in dynamic environments such as autonomous driving, surveillance, and robotics.

How does the small variant compare to larger D-FINE models?

The small variant is faster and lighter, suitable for edge devices. Larger variants like D-FINE-L achieve 54.0% AP at 124 FPS on an NVIDIA T4 GPU.

What are the input and output formats?

Input: images (PIL or tensor). Output: bounding boxes, confidence scores, and class labels.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key to send image inputs and receive detection results.

What license is this model under?

The model card does not specify a license; check the Hugging Face repository for details.

not yet live

We're benchmarking and onboarding D-FINE Small (COCO) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo