DA3 Giant
depth-anything/DA3-GIANT-1.1
published Dec 2025 · updated Dec 2025
DA3 Giant is a vision transformer model for multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation.
specs
| Task | Multi-view depth estimation, camera pose estimation, 3D Gaussian estimation |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 1.15B |
| License | CC BY-NC 4.0 (non-commercial only) |
| Task | Multi-view depth estimation, camera pose estimation, 3D Gaussian estimation |
| Architecture | Plain transformer with unified depth-ray representation |
| Parameters | 1.15B |
| License | CC BY-NC 4.0 (non-commercial only) |
about this model
depth-anything/DA3-GIANT-1.1 is a unified multi-view depth, camera pose, and 3D Gaussian estimation model developed by the ByteDance Seed Team. It uses a single plain transformer (vanilla DINO encoder) with a depth-ray representation, eliminating the need for specialized architectural branches or multi-task learning. The model has 1.15 billion parameters.
Capabilities and Performance
The model accepts any number of input images (with or without known camera poses) and outputs relative depth maps, confidence maps, camera extrinsics and intrinsics, and 3D Gaussians. It is licensed under CC BY-NC 4.0 (non-commercial only).
Depth Anything 3 significantly outperforms Depth Anything 2 for monocular depth estimation and VGGT for multi-view depth and pose estimation. According to the project page, it surpasses VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy. The work has been accepted as an ICLR 2026 Oral.
Extended Use
When deployed in SLAM pipelines (DA3-Long), the model reduces drift in large-scale environments, outperforming COLMAP (which requires >48 hours). A streaming variant (DA3-Streaming) supports ultra-long video sequences with less than 12 GB GPU memory via sliding-window inference.
Training data consists exclusively of public academic datasets. Performance may vary with image quality, lighting, and scene complexity.
best for
- ·Multi-view depth estimation from arbitrary image collections
- ·Camera pose estimation for 3D reconstruction and SLAM
- ·3D Gaussian estimation for novel view synthesis
FAQ
It supports multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation.
1.15 billion.
CC BY-NC 4.0 license, non-commercial use only.
Send requests to the gigarouter OpenAI-compatible endpoint using your API key. The model accepts a list of images and returns depth maps, confidence maps, camera poses, and intrinsics.
Input is a list of images (file paths, PIL Images, or numpy arrays). Output includes depth maps (float32), confidence maps (float32), camera extrinsics (3x4 float32), and intrinsics (3x3 float32). Export formats include glb, npz, ply, and more.
It supports multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation.
1.15 billion.
CC BY-NC 4.0 license, non-commercial use only.
Send requests to the gigarouter OpenAI-compatible endpoint using your API key. The model accepts a list of images and returns depth maps, confidence maps, camera poses, and intrinsics.
Input is a list of images (file paths, PIL Images, or numpy arrays). Output includes depth maps (float32), confidence maps (float32), camera extrinsics (3x4 float32), and intrinsics (3x3 float32). Export formats include glb, npz, ply, and more.
We're benchmarking and onboarding DA3 Giant as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.