DETR ResNet-50

Xenova/detr-resnet-50

published May 2023 · updated Jun 2025

DETR ResNet-50 is an object detection model that uses a transformer encoder-decoder architecture with a ResNet-50 backbone to predict bounding boxes and class labels directly from images.

status

coming soon

API providers

downloads / mo

6.5K

specs

Task	Object Detection
Architecture	Transformer encoder-decoder with ResNet-50 backbone
Parameters	41.3M
License	Apache-2.0

about this model

Xenova/detr-resnet-50 is an object detection model that performs end-to-end detection using a transformer encoder-decoder architecture with a ResNet-50 backbone. It treats detection as a direct set prediction problem, eliminating the need for hand-crafted components such as anchor boxes or non-maximum suppression. The model uses 100 learned object queries and a bipartite matching loss (combining cross-entropy, L1, and generalized IoU) to assign predictions to ground-truth boxes during training.

This model was trained on the COCO 2017 object detection dataset, which contains 118k annotated images. It is released under the Apache-2.0 license. The original research is described in the paper End-to-End Object Detection with Transformers (Carion et al., arXiv:2005.12872, 2020).

Key strengths of the DETR architecture include a simplified pipeline—no region proposal network, no anchor generation, and no post-processing stage—while maintaining competitive accuracy on standard benchmarks. The model is optimized for inference in JavaScript environments via ONNX weights, making it suitable for web-based applications.

Example detection output showing bounding boxes and labels for a cat and a remote control

On gigarouter, this model is available as a hosted, OpenAI-compatible API. Developers can send image inputs and receive detection results without managing infrastructure or conversion steps.

best for

·Detecting common objects in images (COCO 80 classes)
·End-to-end object detection without hand-crafted components like NMS or anchor boxes

FAQ

What is the input format for this model?

The model accepts images as input, typically as raw pixel data or a URL to an image.

What does the model output?

It outputs a list of detected objects, each with a score, a label, and a bounding box (xmin, ymin, xmax, ymax).

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.

What dataset was this model trained on?

It was trained on COCO 2017, which contains 118k annotated images with 80 object categories.

What is the license for this model?

The model is released under the Apache-2.0 license.

not yet live

We're benchmarking and onboarding DETR ResNet-50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo