skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything V2 Large

depth-anything/Depth-Anything-V2-Large

published Jun 2024 · updated Jul 2024

Depth Anything V2 Large is a monocular depth estimation model that produces fine-grained and robust depth maps from a single image, trained on 595K synthetic images and 62M+ real images.

status
coming soon
API providers
0
downloads / mo
44.4K
license
cc-by-nc-4.0

specs

TaskMonocular Depth Estimation
ArchitectureViT-Large (Vision Transformer) with DPT head
Parameters1.3B

about this model

Depth-Anything-V2-Large is a monocular depth estimation model that produces dense depth maps from a single RGB image. It is trained on 595K synthetic labeled images and over 62 million real unlabeled images, leveraging a teacher-student framework with large-scale pseudo-labeling.

Key Strengths

  • Produces finer-grained details and more robust predictions than the previous version (Depth Anything V1).
  • Outperforms Stable Diffusion-based depth models (e.g., Marigold, GeoWizard) in both accuracy and efficiency – over 10x faster inference with a lighter architecture.
  • Available in multiple scales from 25M to 1.3B parameters, with this Large variant offering a strong balance of performance and resource use.
  • Can be fine-tuned for metric depth estimation, supporting both relative and absolute depth tasks.

Performance & Versatility

The model demonstrates strong generalization across diverse scenes. The authors constructed a new evaluation benchmark with precise annotations and varied environments to overcome limitations in existing test sets. Depth-Anything-V2-Large is suitable for applications requiring high-quality depth output without the latency of diffusion-based alternatives.

best for

FAQ

What is Depth Anything V2 Large?

It is the largest variant of Depth Anything V2, a monocular depth estimation model with 1.3B parameters, capable of producing high-quality depth maps from a single RGB image.

How does Depth Anything V2 compare to Depth Anything V1?

V2 provides finer and more robust depth predictions by replacing labeled real images with synthetic images, scaling up the teacher model, and using large-scale pseudo-labeled real images.

What are the input and output formats?

Input is a single RGB image; output is a raw depth map (HxW array) where each pixel value represents relative depth.

How fast is it compared to Stable Diffusion-based models?

It is more than 10x faster and more lightweight than SD-based models like Marigold or Geowizard.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; send an image URL or base64 encoded image and receive the depth map in the response.

not yet live

We're benchmarking and onboarding Depth Anything V2 Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →