DETR ResNet-50
Xenova/detr-resnet-50
published May 2023 · updated Jun 2025
DETR ResNet-50 is an object detection model that uses a transformer encoder-decoder architecture with a ResNet-50 backbone to predict bounding boxes and class labels directly from images.
specs
| Task | Object Detection |
| Architecture | Transformer encoder-decoder with ResNet-50 backbone |
| Parameters | 41.3M |
| License | Apache-2.0 |
about this model
Xenova/detr-resnet-50 is an object detection model that performs end-to-end detection using a transformer encoder-decoder architecture with a ResNet-50 backbone. It treats detection as a direct set prediction problem, eliminating the need for hand-crafted components such as anchor boxes or non-maximum suppression. The model uses 100 learned object queries and a bipartite matching loss (combining cross-entropy, L1, and generalized IoU) to assign predictions to ground-truth boxes during training.
This model was trained on the COCO 2017 object detection dataset, which contains 118k annotated images. It is released under the Apache-2.0 license. The original research is described in the paper End-to-End Object Detection with Transformers (Carion et al., arXiv:2005.12872, 2020).
Key strengths of the DETR architecture include a simplified pipeline—no region proposal network, no anchor generation, and no post-processing stage—while maintaining competitive accuracy on standard benchmarks. The model is optimized for inference in JavaScript environments via ONNX weights, making it suitable for web-based applications.
On gigarouter, this model is available as a hosted, OpenAI-compatible API. Developers can send image inputs and receive detection results without managing infrastructure or conversion steps.
best for
- ·Detecting common objects in images (COCO 80 classes)
- ·End-to-end object detection without hand-crafted components like NMS or anchor boxes
FAQ
The model accepts images as input, typically as raw pixel data or a URL to an image.
It outputs a list of detected objects, each with a score, a label, and a bounding box (xmin, ymin, xmax, ymax).
Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.
It was trained on COCO 2017, which contains 118k annotated images with 80 object categories.
The model is released under the Apache-2.0 license.
We're benchmarking and onboarding DETR ResNet-50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.