Depth Anything 3 Nested Giant-Large
depth-anything/DA3NESTED-GIANT-LARGE-1.1
published Dec 2025 · updated Dec 2025
Depth Anything 3 Nested Giant-Large is a depth model that combines the any-view Giant model with the metric Large model for metric-scale visual geometry reconstruction.
specs
| Task | Depth Estimation, Pose Estimation, 3D Reconstruction |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 1.40B |
| License | CC BY-NC 4.0 (non-commercial only) |
| Training Data | Public academic datasets |
about this model
DA3NESTED-GIANT-LARGE is a depth estimation model that recovers spatially consistent metric-scale geometry from arbitrary visual inputs, combining the any-view Giant model with the metric Large model in a nested architecture.
With 1.40B parameters, it supports relative depth, metric depth, pose estimation, pose conditioning, 3D Gaussian estimation, and sky segmentation. The model uses a single plain transformer with a unified depth-ray representation, eliminating the need for complex multi-task learning.
Depth Anything 3 significantly outperforms prior state-of-the-art models. On the project page benchmarks, it surpasses VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy, and outperforms Depth Anything 2 for monocular depth estimation. The model has been accepted at ICLR 2026 as an Oral presentation.
Key capabilities demonstrated in the research include feed-forward 3D Gaussians estimation for novel view synthesis, and DA3-Long which reduces drift in large-scale SLAM, matching or exceeding COLMAP (which requires over 48 hours). The DA3-Streaming variant handles ultra-long video sequences with less than 12GB GPU memory via sliding-window inference.
This model is developed by the ByteDance Seed Team and is licensed under CC BY-NC 4.0 — non-commercial use only.
best for
- ·Monocular depth estimation from single images
- ·Multi-view 3D reconstruction from unordered photos
- ·Camera pose estimation from video frames
- ·Feed-forward 3D Gaussian splatting for novel view synthesis
FAQ
It excels at metric-scale visual geometry reconstruction from any views, including depth, pose, and 3D Gaussians.
It significantly outperforms Depth Anything 2 on monocular depth and VGGT on multi-view depth and pose estimation.
CC BY-NC 4.0 – non-commercial use only.
Image paths, PIL Images, or numpy arrays. Output includes depth maps, confidence maps, camera extrinsics/intrinsics, and 3D exports (e.g., GLB, PLY).
Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to the gigarouter documentation for the exact endpoint and request format.
We're benchmarking and onboarding Depth Anything 3 Nested Giant-Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.