Depth Anything V2 Indoor Large

depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf

published Jul 2024 · updated Aug 2024

Depth Anything V2 Indoor Large is a monocular depth estimation model fine-tuned for metric depth estimation in indoor scenes.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

10.9K

specs

Task	Metric Depth Estimation (Indoor)
Architecture	DPT with DINOv2 backbone
Parameters	335.3M
Training Data	~600K synthetic labeled images + ~62M unlabeled real images
Input	Single RGB image
Output	Depth map with metric values

about this model

Depth-Anything-V2-Metric-Indoor-Large-hf is a monocular depth estimation model that produces metric depth for indoor scenes, fine-tuned from the Depth Anything V2 foundation model on synthetic Hypersim data.

The model uses a DPT architecture with a DINOv2 backbone and was trained on approximately 600,000 synthetic labeled images and 62 million real unlabeled images. This combination yields state-of-the-art results for both relative and absolute depth estimation, with significantly finer detail and greater robustness than the previous Depth Anything V1. Compared to Stable Diffusion-based depth models, Depth Anything V2 is more than 10 times faster and achieves higher accuracy. The Depth Anything V2 paper was accepted at NeurIPS 2024.

Model Variants

Depth Anything V2 provides six metric depth models across three scales, covering indoor and outdoor scenes. The large indoor variant is shown below.

Base Model	Params	Indoor (Hypersim)	Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small	24.8M	Model Card	Model Card
Depth-Anything-V2-Base	97.5M	Model Card	Model Card
Depth-Anything-V2-Large	335.3M	Model Card	Model Card

Overview of Depth Anything model architecture

Depth Anything overview. Taken from the original paper.

best for

·Indoor scene depth estimation for robotics and navigation
·Augmented reality applications requiring metric depth
·3D reconstruction from indoor images

FAQ

What is the difference between this model and Depth Anything V2 Base?

This model is fine-tuned specifically for metric depth estimation in indoor scenes using the Hypersim dataset, while the Base model predicts relative depth.

How to call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key and specify the model name "depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf".

What is the input format?

The model accepts a single RGB image as input.

What is the output format?

The output is a depth map with metric depth values for indoor scenes.

What architecture does this model use?

It uses the DPT architecture with a DINOv2 backbone, as described in the Depth Anything V2 paper.

not yet live

We're benchmarking and onboarding Depth Anything V2 Indoor Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo