skip to content
gigarouter gigarouter
models / object detection · coming soon

DETR ResNet-50

Xenova/detr-resnet-50

published May 2023 · updated Jun 2025

DETR ResNet-50 is an object detection model that uses a transformer encoder-decoder architecture with a ResNet-50 backbone to predict bounding boxes and class labels directly from images.

status
coming soon
API providers
0
downloads / mo
6.5K

specs

TaskObject Detection
ArchitectureTransformer encoder-decoder with ResNet-50 backbone
Parameters41.3M
LicenseApache-2.0

about this model

Xenova/detr-resnet-50 is an object detection model that performs end-to-end detection using a transformer encoder-decoder architecture with a ResNet-50 backbone. It treats detection as a direct set prediction problem, eliminating the need for hand-crafted components such as anchor boxes or non-maximum suppression. The model uses 100 learned object queries and a bipartite matching loss (combining cross-entropy, L1, and generalized IoU) to assign predictions to ground-truth boxes during training.

This model was trained on the COCO 2017 object detection dataset, which contains 118k annotated images. It is released under the Apache-2.0 license. The original research is described in the paper End-to-End Object Detection with Transformers (Carion et al., arXiv:2005.12872, 2020).

Key strengths of the DETR architecture include a simplified pipeline—no region proposal network, no anchor generation, and no post-processing stage—while maintaining competitive accuracy on standard benchmarks. The model is optimized for inference in JavaScript environments via ONNX weights, making it suitable for web-based applications.

Example detection output showing bounding boxes and labels for a cat and a remote control

On gigarouter, this model is available as a hosted, OpenAI-compatible API. Developers can send image inputs and receive detection results without managing infrastructure or conversion steps.

best for

FAQ

What is the input format for this model?

The model accepts images as input, typically as raw pixel data or a URL to an image.

What does the model output?

It outputs a list of detected objects, each with a score, a label, and a bounding box (xmin, ymin, xmax, ymax).

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.

What dataset was this model trained on?

It was trained on COCO 2017, which contains 118k annotated images with 80 object categories.

What is the license for this model?

The model is released under the Apache-2.0 license.

not yet live

We're benchmarking and onboarding DETR ResNet-50 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →