skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything 3 Nested Giant Large

depth-anything/DA3NESTED-GIANT-LARGE

published Nov 2025 · updated Nov 2025

Depth Anything 3 Nested Giant Large is a depth model that combines any-view Giant and metric Large models for metric-scale visual geometry reconstruction.

est. price
~$0.626
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
52K
license
cc-by-nc-4.0

specs

TaskMetric Depth Estimation, Pose Estimation, 3D Reconstruction
ArchitecturePlain transformer with unified depth-ray representation
Parameters1.40B
LicenseCC BY-NC 4.0 (non-commercial only)

about this model

DA3NESTED-GIANT-LARGE is a visual geometry model that recovers spatially consistent metric-scale depth, camera pose, and 3D structure from arbitrary image inputs, with or without known camera poses. It combines the any-view Giant model with the metric Large model in a nested architecture, totaling 1.40B parameters.

Developed by the ByteDance Seed Team, the model uses a single plain transformer (vanilla DINO encoder) with a unified depth-ray representation, eliminating the need for complex multi-task learning. It is trained exclusively on public academic datasets.

Capabilities

  • Relative and metric depth estimation
  • Camera pose estimation and pose conditioning
  • 3D Gaussian reconstruction and sky segmentation
  • Export formats: GLB, NPZ, PLY, mini NPZ, GS PLY, GS video

Performance

Depth Anything 3 significantly outperforms prior state-of-the-art models. Against VGGT, it achieves an average improvement of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. It also surpasses Depth Anything 2 for monocular depth estimation. The model is trained exclusively on public academic datasets.

Architecture and Training

The model uses a plain transformer backbone (vanilla DINO encoder) with a depth-ray representation, avoiding architectural specialization or complex multi-task learning. A teacher-student training paradigm is employed to achieve detail and generalization on par with Depth Anything 2.

Additional Features

  • DA3-Streaming: handles ultra-long video sequence inference with less than 12GB GPU memory via sliding-window streaming
  • Reference view selection for multi-view inputs via ref_view_strategy
  • Evaluation benchmark pipeline with 5 datasets (ETH3D, ScanNet++, DTU, 7Scenes, HiRoom)

Licensed under CC BY-NC 4.0 (non-commercial use only).

best for

FAQ

What is the main use case for Depth Anything 3 Nested Giant Large?

It is designed for metric-scale visual geometry reconstruction, including depth estimation, camera pose estimation, and 3D reconstruction from arbitrary views.

How does it compare to Depth Anything 2?

It significantly outperforms Depth Anything 2 in monocular depth estimation and also surpasses VGGT in multi-view depth and pose estimation.

What is the license and can I use it commercially?

Licensed under CC BY-NC 4.0, which permits non-commercial use only. Commercial use is not allowed.

What input and output formats does the model support?

Input: image paths, PIL images, or numpy arrays. Output: depth maps, confidence maps, camera extrinsics (w2c), and intrinsics (all float32). Optional export formats include glb, npz, ply, mini_npz, gs_ply, gs_video.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key. Send image data in the request and receive depth/pose/3D outputs in the response.

not yet live

We're benchmarking and onboarding Depth Anything 3 Nested Giant Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →