Depth Anything Large

LiheYoung/depth-anything-large-hf

published Jan 2024 · updated Jan 2024

Depth Anything Large is a monocular depth estimation model that uses a DPT architecture with a DINOv2 backbone, trained on approximately 62 million images for robust zero-shot depth prediction.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

388.9K

license

apache-2.0

specs

Task	Depth Estimation
Architecture	DPT with DINOv2 backbone
Training Data	~62 million images (1.5M labeled + 62M+ unlabeled)
Paper	Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data (CVPR 2024)

about this model

LiheYoung/depth-anything-large-hf is a monocular depth estimation model that produces relative depth maps from a single image. It uses a DPT architecture with a DINOv2 backbone and was trained on approximately 62 million images (1.5M labeled + 62M+ unlabeled), achieving state-of-the-art generalization for both relative and absolute depth estimation. The model is described in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, accepted at CVPR 2024.

Key capabilities

Zero-shot relative depth estimation on any image; strong generalization across diverse domains (indoor, outdoor, synthetic).
Can be fine-tuned for metric depth; pre-trained metric depth models are available for indoor (NYUv2) and outdoor (KITTI) settings.
Rich semantic priors from the encoder enable strong downstream performance on tasks like semantic segmentation.

Benchmark results

Fine-tuned metric depth (ViT-L backbone) outperforms prior methods:

Dataset	AbsRel	RMSE	δ₁	δ₂	δ₃
NYUv2	0.056	0.206	0.984	0.998	1.000
KITTI	0.046	1.896	0.982	0.998	1.000

Zero-shot metric depth transfer (models fine-tuned on NYUv2 or KITTI):

Source → Target	AbsRel	δ₁
NYUv2 → SUN RGB-D	0.500	0.660
NYUv2 → iBims-1	0.150	0.714
KITTI → Virtual KITTI 2	0.085	0.913

Downstream semantic segmentation (fine-tuning the encoder): 86.2 mIoU on Cityscapes, 59.4 mIoU on ADE20K.

Depth Anything overview diagram showing training pipeline with labeled and unlabeled images

The model is integrated into production tools such as ControlNet, InstantID, and InvokeAI, and is available as a hosted API on gigarouter.

best for

·Zero-shot relative depth estimation on any image
·Fine-tuning for metric depth on indoor (NYUv2) or outdoor (KITTI) datasets
·Powering depth-conditioned ControlNet pipelines

FAQ

What is the primary use case for Depth Anything Large?

It is designed for robust zero-shot monocular depth estimation on arbitrary images without fine-tuning.

How does Depth Anything Large compare in size to the small variant?

The model card does not specify parameter counts, but the small variant has 24.8M parameters; the large variant is larger but exact count is not listed.

What input and output format does the model expect?

Input: RGB image (e.g., PIL or tensor). Output: a depth map as a single-channel grayscale image or tensor with relative depth values.

Is the model available as a hosted API on gigarouter?

Yes, Depth Anything Large is hosted on gigarouter as an OpenAI-compatible API. Use your API key and standard endpoints to call it.

What license is the model released under?

The model card does not specify a license; please check the original repository for licensing details.

not yet live

We're benchmarking and onboarding Depth Anything Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo

Distill-Any-Depth-Large-hf

189.6K dl/mo