Depth Anything V2 Metric Outdoor Large
depth-anything/Depth-Anything-V2-Metric-Outdoor-Large-hf
published Jul 2024 · updated Aug 2024
Depth Anything V2 Metric Outdoor Large is a depth estimation model fine-tuned for outdoor metric depth prediction using synthetic Virtual KITTI data.
specs
| Task | Metric Depth Estimation (Outdoor) |
| Architecture | DPT with DINOv2 backbone |
| Parameters | 335.3M |
| Fine-tuned on | Virtual KITTI 2 (synthetic outdoor dataset) |
about this model
Depth-Anything-V2-Metric-Outdoor-Large-hf is a monocular metric depth estimation model fine-tuned for outdoor scenes, built on the Depth Anything V2 foundation and trained on the synthetic Virtual KITTI 2 dataset. It uses the DPT architecture with a DINOv2 backbone and was trained on approximately 600,000 synthetic labeled images and 62 million unlabeled real images, achieving state-of-the-art results for both relative and absolute depth estimation.
Architecture and Training
The model is part of a family of six metric depth models (three scales for indoor and outdoor scenes). The Large variant contains 335.3 million parameters. Training follows three key practices: replacing all labeled real images with synthetic data, scaling up the teacher model capacity, and teaching student models via large-scale pseudo-labeled real images. The paper was accepted to NeurIPS 2024.
Key Strengths
- Produces finer and more robust depth predictions than Depth Anything V1.
- More than 10× faster inference than Stable Diffusion-based depth models, with higher accuracy and fewer parameters.
- Strong generalization capability across diverse outdoor scenes, demonstrated on the DA-2K evaluation benchmark.
Available Variants
| Base Model | Params | Indoor (Hypersim) | Outdoor (Virtual KITTI 2) |
|---|---|---|---|
| Depth-Anything-V2-Small | 24.8M | Model Card | Model Card |
| Depth-Anything-V2-Base | 97.5M | Model Card | Model Card |
| Depth-Anything-V2-Large | 335.3M | Model Card | Model Card |

best for
- ·Autonomous driving depth perception
- ·Outdoor scene reconstruction and mapping
- ·Robotics navigation and obstacle avoidance
FAQ
Input: a single image (e.g., JPEG/PNG). Output: a depth map with metric depth values in meters.
V2 produces much finer and more robust depth predictions, especially in outdoor scenes, due to synthetic training data and a larger teacher model.
It was fine-tuned on the synthetic Virtual KITTI 2 dataset for outdoor metric depth estimation.
Use the OpenAI-compatible endpoint with your API key, sending the image as a base64-encoded string or URL in the request.
Yes, Depth Anything V2 is over 10 times faster and more accurate than models built on Stable Diffusion.
We're benchmarking and onboarding Depth Anything V2 Metric Outdoor Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.