Question 1

What is the DPT Large model best used for?

Accepted Answer

DPT Large is designed for zero-shot monocular depth estimation — predicting depth from a single image without fine-tuning.

Question 2

How does DPT Large differ from MiDaS?

Accepted Answer

DPT Large uses a Vision Transformer backbone instead of a convolutional one, achieving up to 28% relative improvement over MiDaS on depth estimation benchmarks.

Question 3

What input format does the model require?

Accepted Answer

The model expects a single RGB image. The image is resized so that the longer side is 384 pixels and then a 384x384 crop is used during training; at inference, the DPTImageProcessor handles preprocessing.

Question 4

What license is the model released under?

Accepted Answer

The model is released under the Apache 2.0 license according to the Hugging Face model card.

Question 5

How can I use this model via the gigarouter API?

Accepted Answer

You can call the model through gigarouter's OpenAI-compatible endpoint using your API key, sending an image and receiving a depth map in response.

Task	Monocular Depth Estimation
Architecture	Dense Prediction Transformer (Vision Transformer backbone with convolutional decoder)
License	Apache 2.0

Model	Training set	DIW WHDR	ETH3D AbsRel	Sintel AbsRel	KITTI δ>1.25	NYU δ>1.25	TUM δ>1.25
DPT-Large	MIX 6	10.82	0.089	0.270	8.46	8.32	9.97
DPT-Hybrid	MIX 6	11.06	0.093	0.274	11.56	8.69	10.89
MiDaS	MIX 6	12.95	0.116	0.329	16.08	8.

DPT Large

specs

about this model

Key strengths

Zero-shot cross-dataset benchmark results

best for

FAQ

related depth estimation models