YOLOS Small
hustvl/yolos-small
published Apr 2022 · updated May 2024
YOLOS Small is a detection model that uses a Vision Transformer with DETR loss to predict bounding boxes and class labels for objects in images.
specs
| Task | Object Detection |
| Architecture | Vision Transformer (ViT) |
| Parameters | 30.7M |
| AP on COCO val | 36.1 |
| License | Apache-2.0 |
about this model
Architecture and training
The model uses a vanilla ViT architecture with minimal spatial priors. It is pre-trained on ImageNet-1k for 200 epochs and then fine-tuned on COCO 2017 object detection (118k training images) for 150 epochs. Detection is performed by adding 100 learnable detection tokens to the input sequence; the model predicts class and bounding box for each token. The Hungarian algorithm matches predictions to ground truth, and the loss combines cross-entropy for classes with L1 and generalized IoU for boxes.
Performance
On COCO 2017 validation, YOLOS-small achieves an average precision (AP) of 36.1. For comparison, the larger YOLOS-base variant reaches 42.0 AP, matching more complex frameworks such as Faster R-CNN and DETR. The model contains 30.7 million parameters and is released under the Apache-2.0 license.
References
- Paper: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection (Fang et al., NeurIPS 2021)
- Code and pre-trained models: github.com/hustvl/YOLOS
- Hugging Face model page: hustvl/yolos-small
best for
- ·Real-time object detection in images with low compute requirements
- ·Detecting common objects across 80 COCO categories
- ·Embedding object detection into lightweight applications
FAQ
It is a small Vision Transformer model fine-tuned on COCO for object detection, using a bipartite matching loss like DETR.
YOLOS Small has 30.7M parameters and achieves 36.1 AP on COCO val, while YOLOS-Base achieves 42.0 AP with more parameters.
It is released under the Apache-2.0 license.
Input: an image. Output: predicted bounding boxes and corresponding COCO class labels.
Use the OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image.
We're benchmarking and onboarding YOLOS Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.