Depth Anything 3 Large
depth-anything/DA3-LARGE
published Nov 2025 · updated Nov 2025
Depth Anything 3 Large is a multi-view depth estimation and camera pose estimation model based on a plain transformer with a unified depth-ray representation.
specs
| Task | Depth estimation, camera pose estimation |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 0.35B |
| License | CC BY-NC 4.0 |
about this model
depth-anything/DA3-LARGE is a multi-view depth estimation and camera pose estimation model that uses a unified depth-ray representation to recover spatially consistent geometry from arbitrary visual inputs. Developed by the ByteDance Seed Team, it is a single plain transformer (0.35B parameters) trained exclusively on public academic datasets, and has been accepted as an oral at ICLR 2026.
Capabilities
The model jointly estimates relative depth, camera poses (extrinsics and intrinsics), and confidence maps from two or more images. It supports pose conditioning and can export results in formats such as GLB, PLY, and Gaussian splatting assets. A streaming variant (DA3-Streaming) handles ultra-long video sequences with less than 12 GB GPU memory.
Performance
DA3 significantly outperforms prior state-of-the-art models on the key benchmark tasks:
- Monocular depth estimation: surpasses Depth Anything 2 (DA2).
- Multi-view depth and pose estimation: outperforms VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy (arXiv: 2511.10647).
It also achieves strong results in generalizable novel view synthesis via a frozen backbone that predicts 3D Gaussian parameters, and when used in SLAM pipelines it reduces drift in large-scale scenes compared to COLMAP.
Limitations
The model is trained on academic datasets only; performance may degrade on domain-specific images, low-quality inputs, or extreme lighting conditions. Results depend on image quality and scene complexity.
best for
- ·Multi-view 3D reconstruction from unordered image sets
- ·Camera pose estimation for visual geometry tasks
- ·Monocular depth estimation with improved accuracy over Depth Anything 2
FAQ
It excels at multi-view depth estimation and camera pose estimation, outperforming prior models like VGGT and Depth Anything 2 on publicly benchmarked tasks.
Depth Anything 3 Large significantly outperforms Depth Anything 2 for monocular depth estimation and adds multi-view and pose estimation capabilities.
It is licensed under CC BY-NC 4.0, allowing non-commercial use with attribution.
It accepts a list of image paths, PIL Images, or numpy arrays. For the API, standard image formats (JPEG, PNG) are supported.
Use the OpenAI-compatible gigarouter endpoint with your API key, passing the model ID and input images in the request payload.
We're benchmarking and onboarding Depth Anything 3 Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.