Depth Anything V2 Small
depth-anything/Depth-Anything-V2-Small-hf
published Jun 2024 · updated Jul 2024
Depth Anything V2 Small is a monocular depth estimation model that produces fine-grained, robust depth predictions from a single image.
specs
| Task | Monocular Depth Estimation |
| Architecture | DPT with DINOv2 backbone |
| Parameters | 25M |
| Training Data | 595K synthetic labeled images + 62M+ real unlabeled images |
about this model
Depth-Anything-V2-Small-hf is a monocular depth estimation model that produces fine-grained and robust depth predictions from a single image. It is built on the DPT architecture with a DINOv2 backbone and trained on 595K synthetic labeled images and over 62 million real unlabeled images.
Key Strengths
- Produces more fine-grained details than Depth Anything V1.
- More robust than Depth Anything V1 and Stable Diffusion-based models such as Marigold and Geowizard.
- Over 10x faster and more lightweight than SD-based alternatives.
- Highly effective when fine-tuned for metric depth estimation.
Performance and Comparison
As shown in the Depth Anything V2 paper, the model achieves all six preferable properties—fine detail, transparent objects, reflections, complex scenes, efficiency, and transferability—whereas Marigold and Depth Anything V1 each lack multiple properties. The authors also constructed the DA-2K evaluation benchmark with precise annotations and diverse scenes to address limitations of existing test sets.
Scalability and Publication
Depth Anything V2 offers model scales from 25M to 1.3B parameters. The work was accepted at NeurIPS 2024. For the full methodology, see the paper: Depth Anything V2.
best for
- ·Zero-shot depth estimation on any image
- ·Fine-tuning for metric depth estimation on custom datasets
- ·Efficient depth-conditioned image generation (e.g., ControlNet)
FAQ
It is optimized for monocular depth estimation, offering fine-grained and robust depth maps from a single image, suitable for zero-shot inference and fine-tuning.
It is more than 10x faster and more lightweight than SD-based models like Marigold or Geowizard.
Input: a single RGB image. Output: a depth map (grayscale image or tensor) representing relative depth per pixel.
Use the OpenAI-compatible endpoint with your API key, specifying the model as "depth-anything/Depth-Anything-V2-Small-hf".
Source materials do not specify a license; please refer to the model repository for updated information.
We're benchmarking and onboarding Depth Anything V2 Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.