Depth Anything 3 Nested Giant Large

depth-anything/DA3NESTED-GIANT-LARGE

published Nov 2025 · updated Nov 2025

Depth Anything 3 Nested Giant Large is a depth model that combines any-view Giant and metric Large models for metric-scale visual geometry reconstruction.

est. price

~$0.626

/ 1k images · estimated, set at launch

API providers

downloads / mo

52K

license

cc-by-nc-4.0

specs

Task	Metric Depth Estimation, Pose Estimation, 3D Reconstruction
Architecture	Plain transformer with unified depth-ray representation
Parameters	1.40B
License	CC BY-NC 4.0 (non-commercial only)

about this model

DA3NESTED-GIANT-LARGE is a visual geometry model that recovers spatially consistent metric-scale depth, camera pose, and 3D structure from arbitrary image inputs, with or without known camera poses. It combines the any-view Giant model with the metric Large model in a nested architecture, totaling 1.40B parameters.

Developed by the ByteDance Seed Team, the model uses a single plain transformer (vanilla DINO encoder) with a unified depth-ray representation, eliminating the need for complex multi-task learning. It is trained exclusively on public academic datasets.

Capabilities

Relative and metric depth estimation
Camera pose estimation and pose conditioning
3D Gaussian reconstruction and sky segmentation
Export formats: GLB, NPZ, PLY, mini NPZ, GS PLY, GS video

Performance

Depth Anything 3 significantly outperforms prior state-of-the-art models. Against VGGT, it achieves an average improvement of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. It also surpasses Depth Anything 2 for monocular depth estimation. The model is trained exclusively on public academic datasets.

Architecture and Training

The model uses a plain transformer backbone (vanilla DINO encoder) with a depth-ray representation, avoiding architectural specialization or complex multi-task learning. A teacher-student training paradigm is employed to achieve detail and generalization on par with Depth Anything 2.

Additional Features

DA3-Streaming: handles ultra-long video sequence inference with less than 12GB GPU memory via sliding-window streaming
Reference view selection for multi-view inputs via ref_view_strategy
Evaluation benchmark pipeline with 5 datasets (ETH3D, ScanNet++, DTU, 7Scenes, HiRoom)

Licensed under CC BY-NC 4.0 (non-commercial use only).

best for

·Metric-scale depth estimation from single or multi-view images
·Camera pose estimation for multi-view geometry
·3D scene reconstruction including Gaussian splatting

FAQ

What is the main use case for Depth Anything 3 Nested Giant Large?

It is designed for metric-scale visual geometry reconstruction, including depth estimation, camera pose estimation, and 3D reconstruction from arbitrary views.

How does it compare to Depth Anything 2?

It significantly outperforms Depth Anything 2 in monocular depth estimation and also surpasses VGGT in multi-view depth and pose estimation.

What is the license and can I use it commercially?

Licensed under CC BY-NC 4.0, which permits non-commercial use only. Commercial use is not allowed.

What input and output formats does the model support?

Input: image paths, PIL images, or numpy arrays. Output: depth maps, confidence maps, camera extrinsics (w2c), and intrinsics (all float32). Optional export formats include glb, npz, ply, mini_npz, gs_ply, gs_video.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key. Send image data in the request and receive depth/pose/3D outputs in the response.

not yet live

We're benchmarking and onboarding Depth Anything 3 Nested Giant Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo