DA3 Metric Large
depth-anything/DA3METRIC-LARGE
published Nov 2025 · updated Nov 2025
DA3 Metric Large is a monocular metric depth estimation model that provides real-world scale depth maps from single images.
specs
| Task | Monocular Metric Depth Estimation |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 0.35B |
| License | Apache 2.0 |
about this model
DA3METRIC-LARGE is a monocular metric depth estimation model that produces depth maps with real-world scale from single images. Developed by the ByteDance Seed Team, it uses a plain transformer architecture with a unified depth‑ray representation and is trained exclusively on public academic datasets (0.35 B parameters, Apache 2.0 license).
Capabilities
The model outputs metric depth maps, relative depth maps, confidence maps, and camera intrinsics/extrinsics. It also provides sky segmentation. It supports inference on individual images or batches and can export results in GLB, PLY, NPZ, and other formats.
Performance
DA3METRIC-LARGE achieves state‑of‑the‑art results on monocular and multi‑view depth estimation, significantly outperforming Depth Anything V2 and the prior multi‑view model VGGT. On the Visual Geometry Benchmark, it surpasses VGGT by 35.7 % in camera pose accuracy and 23.6 % in geometric accuracy (project page). The model was accepted as an Oral at ICLR 2026.
Key Strengths
- Single plain transformer backbone without architectural specialization.
- Unified depth‑ray representation eliminates complex multi‑task learning.
- Trained only on publicly available academic datasets.
- Additional DA3‑Streaming feature enables ultra‑long video inference with less than 12 GB GPU memory via sliding‑window streaming.
Limitations
Performance may vary on domain‑specific images, depending on image quality, lighting, and scene complexity. No latency or throughput figures are publicly available.
best for
- ·Real-world scale depth estimation for robotics
- ·3D reconstruction from single images
- ·Scene understanding for autonomous navigation
FAQ
It is best for monocular metric depth estimation, delivering depth maps with real-world scale for applications like robotics, 3D reconstruction, and autonomous navigation.
It significantly outperforms Depth Anything 2 for monocular depth estimation and VGGT for multi-view depth and pose estimation, as reported in the paper.
Apache 2.0.
It accepts images as file paths, PIL Images, or numpy arrays, and outputs depth maps, confidence maps, camera poses, and intrinsics.
Use the gigarouter OpenAI-compatible endpoint with your API key, providing the model ID and image data as specified in the API documentation.
We're benchmarking and onboarding DA3 Metric Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.