Depth Anything V2 Small

depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hf

published Jul 2024 · updated Aug 2024

Depth Anything V2 Small is a monocular depth estimation model fine-tuned for indoor metric depth using synthetic Hypersim data.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

18.7K

specs

Task	Metric Depth Estimation (Indoor)
Architecture	DPT with DINOv2 backbone
Parameters	24.8M

about this model

Depth-Anything-V2-Metric-Indoor-Small-hf is a monocular depth estimation model fine-tuned for indoor metric depth prediction. It is built on the Depth Anything V2 architecture, which uses a DPT architecture with a DINOv2 backbone and 24.8M parameters.

The model is fine-tuned on the synthetic Hypersim dataset to produce metric depth values for indoor scenes. Depth Anything V2 was trained on approximately 600K synthetic labeled images and 62 million real unlabeled images, achieving state-of-the-art results for both relative and absolute depth estimation. Compared to the original Depth Anything V1, this version produces finer and more robust depth predictions by replacing all labeled real images with synthetic images, scaling up the teacher model capacity, and teaching student models via large-scale pseudo-labeled real images.

Key strengths include significantly finer and more robust depth predictions than V1, and substantially faster inference (more than 10x) compared to models built on Stable Diffusion while maintaining higher accuracy. The model is part of a family of six metric depth models across three scales (Small, Base, Large) for indoor and outdoor scenes.

Depth Anything overview diagram from the original paper

The model is hosted on gigarouter as a managed, OpenAI-compatible API — no local installation or dependency management is required.

best for

·Indoor 3D reconstruction and scanning
·Augmented reality placement and occlusion handling
·Robotics indoor navigation and obstacle avoidance

FAQ

What input does this model expect?

It expects a single RGB image (e.g., a PIL Image) and outputs a grayscale depth map representing metric depth in meters.

How does this model compare to Depth Anything V2 Base or Large?

The Small variant has 24.8M parameters, offering faster inference and lower resource usage while still providing accurate indoor depth; Base and Large are larger and more accurate but slower.

What license is this model released under?

The model card does not specify a license; please check the original repository or contact the authors for licensing details.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key, sending an image URL or base64 encoded image in a request to the depth estimation endpoint.

not yet live

We're benchmarking and onboarding Depth Anything V2 Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo