skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything V2 Indoor Large

depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf

published Jul 2024 · updated Aug 2024

Depth Anything V2 Indoor Large is a monocular depth estimation model fine-tuned for metric depth estimation in indoor scenes.

est. price
~$0.094
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
10.9K

specs

TaskMetric Depth Estimation (Indoor)
ArchitectureDPT with DINOv2 backbone
Parameters335.3M
Training Data~600K synthetic labeled images + ~62M unlabeled real images
InputSingle RGB image
OutputDepth map with metric values

about this model

Depth-Anything-V2-Metric-Indoor-Large-hf is a monocular depth estimation model that produces metric depth for indoor scenes, fine-tuned from the Depth Anything V2 foundation model on synthetic Hypersim data.

The model uses a DPT architecture with a DINOv2 backbone and was trained on approximately 600,000 synthetic labeled images and 62 million real unlabeled images. This combination yields state-of-the-art results for both relative and absolute depth estimation, with significantly finer detail and greater robustness than the previous Depth Anything V1. Compared to Stable Diffusion-based depth models, Depth Anything V2 is more than 10 times faster and achieves higher accuracy. The Depth Anything V2 paper was accepted at NeurIPS 2024.

Model Variants

Depth Anything V2 provides six metric depth models across three scales, covering indoor and outdoor scenes. The large indoor variant is shown below.

Base ModelParamsIndoor (Hypersim)Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small24.8MModel CardModel Card
Depth-Anything-V2-Base97.5MModel CardModel Card
Depth-Anything-V2-Large335.3MModel CardModel Card
Overview of Depth Anything model architecture

Depth Anything overview. Taken from the original paper.

best for

FAQ

What is the difference between this model and Depth Anything V2 Base?

This model is fine-tuned specifically for metric depth estimation in indoor scenes using the Hypersim dataset, while the Base model predicts relative depth.

How to call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key and specify the model name "depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf".

What is the input format?

The model accepts a single RGB image as input.

What is the output format?

The output is a depth map with metric depth values for indoor scenes.

What architecture does this model use?

It uses the DPT architecture with a DINOv2 backbone, as described in the Depth Anything V2 paper.

not yet live

We're benchmarking and onboarding Depth Anything V2 Indoor Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →