Depth Anything V2 Metric Outdoor Large

depth-anything/Depth-Anything-V2-Metric-Outdoor-Large-hf

published Jul 2024 · updated Aug 2024

Depth Anything V2 Metric Outdoor Large is a depth estimation model fine-tuned for outdoor metric depth prediction using synthetic Virtual KITTI data.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

4.1K

license

apache-2.0

specs

Task	Metric Depth Estimation (Outdoor)
Architecture	DPT with DINOv2 backbone
Parameters	335.3M
Fine-tuned on	Virtual KITTI 2 (synthetic outdoor dataset)

about this model

Depth-Anything-V2-Metric-Outdoor-Large-hf is a monocular metric depth estimation model fine-tuned for outdoor scenes, built on the Depth Anything V2 foundation and trained on the synthetic Virtual KITTI 2 dataset. It uses the DPT architecture with a DINOv2 backbone and was trained on approximately 600,000 synthetic labeled images and 62 million unlabeled real images, achieving state-of-the-art results for both relative and absolute depth estimation.

Architecture and Training

The model is part of a family of six metric depth models (three scales for indoor and outdoor scenes). The Large variant contains 335.3 million parameters. Training follows three key practices: replacing all labeled real images with synthetic data, scaling up the teacher model capacity, and teaching student models via large-scale pseudo-labeled real images. The paper was accepted to NeurIPS 2024.

Key Strengths

Produces finer and more robust depth predictions than Depth Anything V1.
More than 10× faster inference than Stable Diffusion-based depth models, with higher accuracy and fewer parameters.
Strong generalization capability across diverse outdoor scenes, demonstrated on the DA-2K evaluation benchmark.

Available Variants

Base Model	Params	Indoor (Hypersim)	Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small	24.8M	Model Card	Model Card
Depth-Anything-V2-Base	97.5M	Model Card	Model Card
Depth-Anything-V2-Large	335.3M	Model Card	Model Card

Depth Anything V2 overview diagram showing relative and metric depth estimation capabilities.

best for

·Autonomous driving depth perception
·Outdoor scene reconstruction and mapping
·Robotics navigation and obstacle avoidance

FAQ

What is the input and output format for this model?

Input: a single image (e.g., JPEG/PNG). Output: a depth map with metric depth values in meters.

How does Depth Anything V2 compare to V1 in terms of quality?

V2 produces much finer and more robust depth predictions, especially in outdoor scenes, due to synthetic training data and a larger teacher model.

What data was this model fine-tuned on?

It was fine-tuned on the synthetic Virtual KITTI 2 dataset for outdoor metric depth estimation.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending the image as a base64-encoded string or URL in the request.

Is this model faster than Stable Diffusion-based depth models?

Yes, Depth Anything V2 is over 10 times faster and more accurate than models built on Stable Diffusion.

not yet live

We're benchmarking and onboarding Depth Anything V2 Metric Outdoor Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo