DPT-Hybrid MiDaS

Intel/dpt-hybrid-midas

published Dec 2022 · updated Feb 2024

DPT-Hybrid MiDaS is a monocular depth estimation model that uses a Vision Transformer hybrid backbone to predict depth from a single image.

status

coming soon

API providers

downloads / mo

225.1K

license

apache-2.0

specs

Task	Monocular Depth Estimation
Architecture	Dense Prediction Transformer (DPT) with ViT-hybrid backbone
Parameters	Not specified in card
License	MIT (per GitHub repo)

about this model

DPT-Hybrid (Intel/dpt-hybrid-midas) is a monocular depth estimation model that uses a Vision Transformer (ViT) backbone with a hybrid architecture to predict depth from a single image. Introduced in Ranftl et al. (2021), the model was trained on the MIX 6 dataset (1.4 million images).

Architecture diagram of DPT-Hybrid showing transformer backbone and convolutional decoder

The model demonstrates strong zero-shot cross-dataset transfer performance. The following table compares DPT-Hybrid against prior methods on multiple datasets, with lower values being better for all metrics.

Model	Training set	DIW WHDR	ETH3D AbsRel	Sintel AbsRel	KITTI δ>1.25	NYU δ>1.25	TUM δ>1.25
DPT – Large	MIX 6	10.82	0.089	0.270	8.46	8.32	9.97
DPT – Hybrid	MIX 6	11.06	0.093	0.274	11.56	8.69	10.89
MiDaS	MIX 6	12.95	0.116	0.329	16.08	8.71	12.51

Fine-tuned variants of DPT-Hybrid have achieved additional results: on the KITTI Eigen split, the model fine-tuned on KITTI reports d1=0.959, AbsRel=0.062, RMSE=2.575. On NYUv2, the fine-tuned model achieves d1=0.904, AbsRel=0.109, RMSE=0.357 (SILog=9.521).

Note that the original project is no longer actively maintained by Intel, as stated in the official repository. Gigarouter hosts this model as a managed, OpenAI-compatible API, removing infrastructure concerns for developers.

best for

·Zero-shot monocular depth estimation from a single image
·Fine-tuning for domain-specific depth prediction (e.g., KITTI, NYUv2)
·Robust depth prediction with global context via Vision Transformer

FAQ

What is the DPT-Hybrid MiDaS model best used for?

It is best for zero-shot monocular depth estimation from a single image, and can be fine-tuned for specific datasets like KITTI or NYUv2.

What is the license for this model?

The model is licensed under MIT, according to the official GitHub repository.

What input format does the model expect?

The model expects a single RGB image, typically resized so the longer side is 384 pixels, processed via the DPTImageProcessor.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send an image URL or base64-encoded image and receive depth map output.

Is this model still actively maintained?

No, Intel has ceased development and the project is no longer actively maintained.

not yet live

We're benchmarking and onboarding DPT-Hybrid MiDaS as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo

Distill-Any-Depth-Large-hf

189.6K dl/mo