Depth Anything 3 Giant
depth-anything/DA3-GIANT
published Nov 2025 · updated Nov 2025
Depth Anything 3 Giant is a depth model that performs multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation from arbitrary views.
specs
| Task | Multi-view depth estimation, camera pose estimation, 3D Gaussian estimation |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 1.15B |
| License | CC BY-NC 4.0 (non-commercial only) |
about this model
depth-anything/DA3-GIANT is a multi-view depth estimation model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses, while also enabling camera pose estimation and 3D Gaussian estimation.
Developed by the ByteDance Seed team and accepted as an oral at ICLR 2026, DA3-GIANT is a 1.15B-parameter plain vision transformer that uses a unified depth-ray representation, eliminating the need for complex multi-task learning. The model is trained exclusively on public academic datasets and is released under a CC BY-NC 4.0 license (non-commercial use only).
Key capabilities
- Relative depth estimation from single or multiple images
- Camera pose (extrinsic and intrinsic) estimation
- 3D Gaussian prediction and export (GLB, PLY, NPZ, etc.)
- Pose conditioning for controlled geometry
Benchmark performance
According to the project page, DA3-GIANT surpasses the prior state-of-the-art VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy. It also significantly outperforms Depth Anything 2 for monocular depth estimation. Simply replacing VGGT in VGGT-Long with DA3 (DA3-Long) reduces drift in large-scale environments, even outperforming COLMAP (which requires 48+ hours to complete).
Model series
The DA3 family includes DA3-GIANT, DA3-Large, DA3-Base, DA3-Small, and specialized variants: DA3Metric-Large (metric depth), DA3Mono-Large (monocular), DA3Nested-Giant-Large, and DA3-Streaming (for ultra-long video sequences using under 12GB GPU memory).
Limitations
The model may show reduced performance on domain-specific images, and results depend on image quality, lighting, and scene complexity. The CC BY-NC 4.0 license restricts use to non-commercial applications.
For detailed benchmarks and the paper, see the project page and arXiv:2511.10647.
best for
- ·Multi-view depth estimation from unordered image sets
- ·Camera pose estimation for 3D reconstruction
- ·3D Gaussian splatting from images
FAQ
The API accepts a list of image paths, PIL Images, or numpy arrays.
No, it is licensed under CC BY-NC 4.0 for non-commercial use only.
It significantly outperforms Depth Anything 2 for monocular depth estimation.
Use the gigarouter OpenAI-compatible endpoint with your API key.
Output includes depth maps, confidence maps, camera extrinsics and intrinsics as numpy arrays, and optionally GLB/PLY files.
We're benchmarking and onboarding Depth Anything 3 Giant as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.