Depth Anything V2 Indoor Large
depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf
published Jul 2024 · updated Aug 2024
Depth Anything V2 Indoor Large is a monocular depth estimation model fine-tuned for metric depth estimation in indoor scenes.
specs
| Task | Metric Depth Estimation (Indoor) |
| Architecture | DPT with DINOv2 backbone |
| Parameters | 335.3M |
| Training Data | ~600K synthetic labeled images + ~62M unlabeled real images |
| Input | Single RGB image |
| Output | Depth map with metric values |
about this model
Depth-Anything-V2-Metric-Indoor-Large-hf is a monocular depth estimation model that produces metric depth for indoor scenes, fine-tuned from the Depth Anything V2 foundation model on synthetic Hypersim data.
The model uses a DPT architecture with a DINOv2 backbone and was trained on approximately 600,000 synthetic labeled images and 62 million real unlabeled images. This combination yields state-of-the-art results for both relative and absolute depth estimation, with significantly finer detail and greater robustness than the previous Depth Anything V1. Compared to Stable Diffusion-based depth models, Depth Anything V2 is more than 10 times faster and achieves higher accuracy. The Depth Anything V2 paper was accepted at NeurIPS 2024.
Model Variants
Depth Anything V2 provides six metric depth models across three scales, covering indoor and outdoor scenes. The large indoor variant is shown below.
| Base Model | Params | Indoor (Hypersim) | Outdoor (Virtual KITTI 2) |
|---|---|---|---|
| Depth-Anything-V2-Small | 24.8M | Model Card | Model Card |
| Depth-Anything-V2-Base | 97.5M | Model Card | Model Card |
| Depth-Anything-V2-Large | 335.3M | Model Card | Model Card |
Depth Anything overview. Taken from the original paper.
best for
- ·Indoor scene depth estimation for robotics and navigation
- ·Augmented reality applications requiring metric depth
- ·3D reconstruction from indoor images
FAQ
This model is fine-tuned specifically for metric depth estimation in indoor scenes using the Hypersim dataset, while the Base model predicts relative depth.
Use the OpenAI-compatible endpoint with your API key and specify the model name "depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf".
The model accepts a single RGB image as input.
The output is a depth map with metric depth values for indoor scenes.
It uses the DPT architecture with a DINOv2 backbone, as described in the Depth Anything V2 paper.
We're benchmarking and onboarding Depth Anything V2 Indoor Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.