Depth Anything 3 Nested Giant-Large

depth-anything/DA3NESTED-GIANT-LARGE-1.1

published Dec 2025 · updated Dec 2025

Depth Anything 3 Nested Giant-Large is a depth model that combines the any-view Giant model with the metric Large model for metric-scale visual geometry reconstruction.

est. price

~$0.626

/ 1k images · estimated, set at launch

API providers

downloads / mo

199.9K

license

cc-by-nc-4.0

specs

Task	Depth Estimation, Pose Estimation, 3D Reconstruction
Architecture	Plain transformer with unified depth-ray representation
Parameters	1.40B
License	CC BY-NC 4.0 (non-commercial only)
Training Data	Public academic datasets

about this model

DA3NESTED-GIANT-LARGE is a depth estimation model that recovers spatially consistent metric-scale geometry from arbitrary visual inputs, combining the any-view Giant model with the metric Large model in a nested architecture.

With 1.40B parameters, it supports relative depth, metric depth, pose estimation, pose conditioning, 3D Gaussian estimation, and sky segmentation. The model uses a single plain transformer with a unified depth-ray representation, eliminating the need for complex multi-task learning.

Depth Anything 3 significantly outperforms prior state-of-the-art models. On the project page benchmarks, it surpasses VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy, and outperforms Depth Anything 2 for monocular depth estimation. The model has been accepted at ICLR 2026 as an Oral presentation.

Key capabilities demonstrated in the research include feed-forward 3D Gaussians estimation for novel view synthesis, and DA3-Long which reduces drift in large-scale SLAM, matching or exceeding COLMAP (which requires over 48 hours). The DA3-Streaming variant handles ultra-long video sequences with less than 12GB GPU memory via sliding-window inference.

This model is developed by the ByteDance Seed Team and is licensed under CC BY-NC 4.0 — non-commercial use only.

best for

·Monocular depth estimation from single images
·Multi-view 3D reconstruction from unordered photos
·Camera pose estimation from video frames
·Feed-forward 3D Gaussian splatting for novel view synthesis

FAQ

What is this model best for?

It excels at metric-scale visual geometry reconstruction from any views, including depth, pose, and 3D Gaussians.

How does it compare to Depth Anything 2 and VGGT?

It significantly outperforms Depth Anything 2 on monocular depth and VGGT on multi-view depth and pose estimation.

What are the license terms?

CC BY-NC 4.0 – non-commercial use only.

What input formats does it accept?

Image paths, PIL Images, or numpy arrays. Output includes depth maps, confidence maps, camera extrinsics/intrinsics, and 3D exports (e.g., GLB, PLY).

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to the gigarouter documentation for the exact endpoint and request format.

not yet live

We're benchmarking and onboarding Depth Anything 3 Nested Giant-Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo

Distill-Any-Depth-Large-hf

189.6K dl/mo