Depth Anything Large
LiheYoung/depth-anything-large-hf
published Jan 2024 · updated Jan 2024
Depth Anything Large is a monocular depth estimation model that uses a DPT architecture with a DINOv2 backbone, trained on approximately 62 million images for robust zero-shot depth prediction.
specs
| Task | Depth Estimation |
| Architecture | DPT with DINOv2 backbone |
| Training Data | ~62 million images (1.5M labeled + 62M+ unlabeled) |
| Paper | Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data (CVPR 2024) |
about this model
LiheYoung/depth-anything-large-hf is a monocular depth estimation model that produces relative depth maps from a single image. It uses a DPT architecture with a DINOv2 backbone and was trained on approximately 62 million images (1.5M labeled + 62M+ unlabeled), achieving state-of-the-art generalization for both relative and absolute depth estimation. The model is described in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, accepted at CVPR 2024.
Key capabilities
- Zero-shot relative depth estimation on any image; strong generalization across diverse domains (indoor, outdoor, synthetic).
- Can be fine-tuned for metric depth; pre-trained metric depth models are available for indoor (NYUv2) and outdoor (KITTI) settings.
- Rich semantic priors from the encoder enable strong downstream performance on tasks like semantic segmentation.
Benchmark results
Fine-tuned metric depth (ViT-L backbone) outperforms prior methods:
| Dataset | AbsRel | RMSE | δ₁ | δ₂ | δ₃ |
|---|---|---|---|---|---|
| NYUv2 | 0.056 | 0.206 | 0.984 | 0.998 | 1.000 |
| KITTI | 0.046 | 1.896 | 0.982 | 0.998 | 1.000 |
Zero-shot metric depth transfer (models fine-tuned on NYUv2 or KITTI):
| Source → Target | AbsRel | δ₁ |
|---|---|---|
| NYUv2 → SUN RGB-D | 0.500 | 0.660 |
| NYUv2 → iBims-1 | 0.150 | 0.714 |
| KITTI → Virtual KITTI 2 | 0.085 | 0.913 |
Downstream semantic segmentation (fine-tuning the encoder): 86.2 mIoU on Cityscapes, 59.4 mIoU on ADE20K.
The model is integrated into production tools such as ControlNet, InstantID, and InvokeAI, and is available as a hosted API on gigarouter.
best for
- ·Zero-shot relative depth estimation on any image
- ·Fine-tuning for metric depth on indoor (NYUv2) or outdoor (KITTI) datasets
- ·Powering depth-conditioned ControlNet pipelines
FAQ
It is designed for robust zero-shot monocular depth estimation on arbitrary images without fine-tuning.
The model card does not specify parameter counts, but the small variant has 24.8M parameters; the large variant is larger but exact count is not listed.
Input: RGB image (e.g., PIL or tensor). Output: a depth map as a single-channel grayscale image or tensor with relative depth values.
Yes, Depth Anything Large is hosted on gigarouter as an OpenAI-compatible API. Use your API key and standard endpoints to call it.
The model card does not specify a license; please check the original repository for licensing details.
We're benchmarking and onboarding Depth Anything Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.