skip to content
gigarouter gigarouter
models / depth estimation · coming soon

DPT-Hybrid MiDaS

Intel/dpt-hybrid-midas

published Dec 2022 · updated Feb 2024

DPT-Hybrid MiDaS is a monocular depth estimation model that uses a Vision Transformer hybrid backbone to predict depth from a single image.

status
coming soon
API providers
0
downloads / mo
225.1K
license
apache-2.0

specs

TaskMonocular Depth Estimation
ArchitectureDense Prediction Transformer (DPT) with ViT-hybrid backbone
ParametersNot specified in card
LicenseMIT (per GitHub repo)

about this model

DPT-Hybrid (Intel/dpt-hybrid-midas) is a monocular depth estimation model that uses a Vision Transformer (ViT) backbone with a hybrid architecture to predict depth from a single image. Introduced in Ranftl et al. (2021), the model was trained on the MIX 6 dataset (1.4 million images).

Architecture diagram of DPT-Hybrid showing transformer backbone and convolutional decoder

The model demonstrates strong zero-shot cross-dataset transfer performance. The following table compares DPT-Hybrid against prior methods on multiple datasets, with lower values being better for all metrics.

Model Training set DIW WHDR ETH3D AbsRel Sintel AbsRel KITTI δ>1.25 NYU δ>1.25 TUM δ>1.25
DPT – Large MIX 6 10.82 0.089 0.270 8.46 8.32 9.97
DPT – Hybrid MIX 6 11.06 0.093 0.274 11.56 8.69 10.89
MiDaS MIX 6 12.95 0.116 0.329 16.08 8.71 12.51

Fine-tuned variants of DPT-Hybrid have achieved additional results: on the KITTI Eigen split, the model fine-tuned on KITTI reports d1=0.959, AbsRel=0.062, RMSE=2.575. On NYUv2, the fine-tuned model achieves d1=0.904, AbsRel=0.109, RMSE=0.357 (SILog=9.521).

Note that the original project is no longer actively maintained by Intel, as stated in the official repository. Gigarouter hosts this model as a managed, OpenAI-compatible API, removing infrastructure concerns for developers.

best for

FAQ

What is the DPT-Hybrid MiDaS model best used for?

It is best for zero-shot monocular depth estimation from a single image, and can be fine-tuned for specific datasets like KITTI or NYUv2.

What is the license for this model?

The model is licensed under MIT, according to the official GitHub repository.

What input format does the model expect?

The model expects a single RGB image, typically resized so the longer side is 384 pixels, processed via the DPTImageProcessor.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send an image URL or base64-encoded image and receive depth map output.

Is this model still actively maintained?

No, Intel has ceased development and the project is no longer actively maintained.

not yet live

We're benchmarking and onboarding DPT-Hybrid MiDaS as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →