Depth Anything V2 Small

depth-anything/Depth-Anything-V2-Small-hf

published Jun 2024 · updated Jul 2024

Depth Anything V2 Small is a monocular depth estimation model that produces fine-grained, robust depth predictions from a single image.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

1.7M

license

apache-2.0

specs

Task	Monocular Depth Estimation
Architecture	DPT with DINOv2 backbone
Parameters	25M
Training Data	595K synthetic labeled images + 62M+ real unlabeled images

about this model

Depth-Anything-V2-Small-hf is a monocular depth estimation model that produces fine-grained and robust depth predictions from a single image. It is built on the DPT architecture with a DINOv2 backbone and trained on 595K synthetic labeled images and over 62 million real unlabeled images.

Key Strengths

Produces more fine-grained details than Depth Anything V1.
More robust than Depth Anything V1 and Stable Diffusion-based models such as Marigold and Geowizard.
Over 10x faster and more lightweight than SD-based alternatives.
Highly effective when fine-tuned for metric depth estimation.

Performance and Comparison

As shown in the Depth Anything V2 paper, the model achieves all six preferable properties—fine detail, transparent objects, reflections, complex scenes, efficiency, and transferability—whereas Marigold and Depth Anything V1 each lack multiple properties. The authors also constructed the DA-2K evaluation benchmark with precise annotations and diverse scenes to address limitations of existing test sets.

Scalability and Publication

Depth Anything V2 offers model scales from 25M to 1.3B parameters. The work was accepted at NeurIPS 2024. For the full methodology, see the paper: Depth Anything V2.

best for

·Zero-shot depth estimation on any image
·Fine-tuning for metric depth estimation on custom datasets
·Efficient depth-conditioned image generation (e.g., ControlNet)

FAQ

What is the main use case for Depth Anything V2 Small?

It is optimized for monocular depth estimation, offering fine-grained and robust depth maps from a single image, suitable for zero-shot inference and fine-tuning.

How does this model compare in speed to Stable Diffusion-based depth models?

It is more than 10x faster and more lightweight than SD-based models like Marigold or Geowizard.

What is the input and output format?

Input: a single RGB image. Output: a depth map (grayscale image or tensor) representing relative depth per pixel.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, specifying the model as "depth-anything/Depth-Anything-V2-Small-hf".

What is the license for this model?

Source materials do not specify a license; please refer to the model repository for updated information.

not yet live

We're benchmarking and onboarding Depth Anything V2 Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo

Distill-Any-Depth-Large-hf

189.6K dl/mo