Depth Anything Small
LiheYoung/depth-anything-small-hf
published Jan 2024 · updated Jan 2024
Depth Anything Small is a monocular depth estimation model that uses a DPT architecture with a DINOv2 backbone to predict relative depth from a single image.
specs
| Task | Monocular Depth Estimation |
| Architecture | DPT with DINOv2 backbone |
| Parameters | 24.8M |
| License | Apache 2.0 |
about this model
LiheYoung/depth-anything-small-hf is a monocular depth estimation model that delivers state-of-the-art zero-shot relative depth predictions using a lightweight DPT architecture with a DINOv2 backbone, trained on over 62 million images.
Key strengths
The small variant (24.8 million parameters) achieves competitive or superior zero-shot performance compared to MiDaS v3.1 BEiT_L-512 (345 million parameters), with 14× fewer parameters. On the KITTI dataset, it reports an AbsRel of 0.080 and δ1 of 0.936; on NYUv2, AbsRel 0.053 and δ1 0.972. Additional zero-shot results on Sintel, DDAD, ETH3D, and DIODE are listed below.
| Dataset | AbsRel | δ1 |
|---|---|---|
| KITTI | 0.080 | 0.936 |
| NYUv2 | 0.053 | 0.972 |
| Sintel | 0.464 | 0.739 |
| DDAD | 0.247 | 0.768 |
| ETH3D | 0.127 | 0.885 |
| DIODE | 0.076 | 0.939 |
When fine-tuned for metric depth on NYUv2 and KITTI, the model sets new state-of-the-art results. The encoder also transfers well to semantic segmentation, achieving 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K.
Introduced in the CVPR 2024 paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, the model is hosted on gigarouter as a managed OpenAI-compatible API, eliminating the need for local installation or hardware configuration.
best for
- ·Zero-shot relative depth estimation for any image
- ·Providing depth conditioning for ControlNet pipelines
- ·Fine-tuning for metric depth estimation on NYUv2 or KITTI
FAQ
It accepts a single RGB image (e.g., JPEG/PNG) and returns a depth map as a grayscale image or tensor.
Depth Anything Small has 24.8M parameters (14x fewer than MiDaS v3.1 BEiT_L-512) yet outperforms MiDaS on most zero-shot benchmarks.
The model is released under the Apache 2.0 license.
Use the OpenAI-compatible endpoint with your API key; send an image URL or base64-encoded image and receive the depth map in response.
The model was trained on 1.5M labeled images and 62M+ unlabeled images, significantly expanding data coverage.
We're benchmarking and onboarding Depth Anything Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.