DA3 Metric Large

depth-anything/DA3METRIC-LARGE

published Nov 2025 · updated Nov 2025

DA3 Metric Large is a monocular metric depth estimation model that provides real-world scale depth maps from single images.

status

coming soon

API providers

downloads / mo

825K

license

apache-2.0

specs

Task	Monocular Metric Depth Estimation
Architecture	Plain transformer with unified depth-ray representation
Parameters	0.35B
License	Apache 2.0

about this model

DA3METRIC-LARGE is a monocular metric depth estimation model that produces depth maps with real-world scale from single images. Developed by the ByteDance Seed Team, it uses a plain transformer architecture with a unified depth‑ray representation and is trained exclusively on public academic datasets (0.35 B parameters, Apache 2.0 license).

Capabilities

The model outputs metric depth maps, relative depth maps, confidence maps, and camera intrinsics/extrinsics. It also provides sky segmentation. It supports inference on individual images or batches and can export results in GLB, PLY, NPZ, and other formats.

Performance

DA3METRIC-LARGE achieves state‑of‑the‑art results on monocular and multi‑view depth estimation, significantly outperforming Depth Anything V2 and the prior multi‑view model VGGT. On the Visual Geometry Benchmark, it surpasses VGGT by 35.7 % in camera pose accuracy and 23.6 % in geometric accuracy (project page). The model was accepted as an Oral at ICLR 2026.

Key Strengths

Single plain transformer backbone without architectural specialization.
Unified depth‑ray representation eliminates complex multi‑task learning.
Trained only on publicly available academic datasets.
Additional DA3‑Streaming feature enables ultra‑long video inference with less than 12 GB GPU memory via sliding‑window streaming.

Limitations

Performance may vary on domain‑specific images, depending on image quality, lighting, and scene complexity. No latency or throughput figures are publicly available.

best for

·Real-world scale depth estimation for robotics
·3D reconstruction from single images
·Scene understanding for autonomous navigation

FAQ

What is this model best for?

It is best for monocular metric depth estimation, delivering depth maps with real-world scale for applications like robotics, 3D reconstruction, and autonomous navigation.

How does DA3 Metric Large compare to previous depth models?

It significantly outperforms Depth Anything 2 for monocular depth estimation and VGGT for multi-view depth and pose estimation, as reported in the paper.

What is the license?

Apache 2.0.

What input format does the model expect?

It accepts images as file paths, PIL Images, or numpy arrays, and outputs depth maps, confidence maps, camera poses, and intrinsics.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, providing the model ID and image data as specified in the API documentation.

not yet live

We're benchmarking and onboarding DA3 Metric Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo

Distill-Any-Depth-Large-hf

189.6K dl/mo