D-FINE Small (COCO)
ustc-community/dfine-small-coco
published Feb 2025 · updated May 2025
D-FINE Small (COCO) is a detection model that redefines bounding box regression as fine-grained distribution refinement for real-time object detection.
specs
| Task | Object Detection |
| Architecture | D-FINE (Transformer-based DETR) |
| Training Dataset | COCO 2017 |
| Input | Images |
| Output | Bounding boxes, scores, and class labels |
| Framework | Hugging Face Transformers (PyTorch) |
about this model
ustc-community/dfine-small-coco is a real-time object detection model that redefines bounding box regression using Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). Developed as part of the D-FINE family, it achieves high localization precision with minimal added parameters, making it suitable for latency-sensitive applications.
Key Strengths
- Enhanced bounding box regression via fine-grained distribution refinement, improving localization accuracy over traditional DETR approaches.
- Global optimal localization self-distillation boosts performance without extra inference cost.
- The method improves a range of DETR models by up to 5.3% AP with negligible additional parameters and training overhead.
Benchmark Results
Evaluated on COCO val2017, the D-FINE family delivers competitive speed and accuracy on an NVIDIA T4 GPU (TensorRT FP16):
| Model Variant | AP | FPS |
|---|---|---|
| D-FINE-L | 54.0% | 124 |
| D-FINE-X | 55.8% | 78 |
| D-FINE-L (Objects365 pretraining) | 57.1% | 124 |
| D-FINE-X (Objects365 pretraining) | 59.3% | 78 |
The dfine-small-coco variant is trained on COCO train2017. Its performance aligns with the D-FINE family’s trade-off between accuracy and speed.
This model is hosted as a managed, OpenAI-compatible API on gigarouter, enabling direct integration without infrastructure overhead.
best for
- ·Autonomous driving
- ·Surveillance systems
- ·Robotics
- ·Retail analytics
FAQ
Real-time object detection in dynamic environments such as autonomous driving, surveillance, and robotics.
The small variant is faster and lighter, suitable for edge devices. Larger variants like D-FINE-L achieve 54.0% AP at 124 FPS on an NVIDIA T4 GPU.
Input: images (PIL or tensor). Output: bounding boxes, confidence scores, and class labels.
Use the gigarouter OpenAI-compatible endpoint with an API key to send image inputs and receive detection results.
The model card does not specify a license; check the Hugging Face repository for details.
We're benchmarking and onboarding D-FINE Small (COCO) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.