Depth Anything 3 Large

depth-anything/DA3-LARGE

published Nov 2025 · updated Nov 2025

Depth Anything 3 Large is a multi-view depth estimation and camera pose estimation model based on a plain transformer with a unified depth-ray representation.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

177K

license

cc-by-nc-4.0

specs

Task	Depth estimation, camera pose estimation
Architecture	Plain transformer with unified depth-ray representation
Parameters	0.35B
License	CC BY-NC 4.0

about this model

depth-anything/DA3-LARGE is a multi-view depth estimation and camera pose estimation model that uses a unified depth-ray representation to recover spatially consistent geometry from arbitrary visual inputs. Developed by the ByteDance Seed Team, it is a single plain transformer (0.35B parameters) trained exclusively on public academic datasets, and has been accepted as an oral at ICLR 2026.

Capabilities

The model jointly estimates relative depth, camera poses (extrinsics and intrinsics), and confidence maps from two or more images. It supports pose conditioning and can export results in formats such as GLB, PLY, and Gaussian splatting assets. A streaming variant (DA3-Streaming) handles ultra-long video sequences with less than 12 GB GPU memory.

Performance

DA3 significantly outperforms prior state-of-the-art models on the key benchmark tasks:

Monocular depth estimation: surpasses Depth Anything 2 (DA2).
Multi-view depth and pose estimation: outperforms VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy (arXiv: 2511.10647).

It also achieves strong results in generalizable novel view synthesis via a frozen backbone that predicts 3D Gaussian parameters, and when used in SLAM pipelines it reduces drift in large-scale scenes compared to COLMAP.

Limitations

The model is trained on academic datasets only; performance may degrade on domain-specific images, low-quality inputs, or extreme lighting conditions. Results depend on image quality and scene complexity.

best for

·Multi-view 3D reconstruction from unordered image sets
·Camera pose estimation for visual geometry tasks
·Monocular depth estimation with improved accuracy over Depth Anything 2

FAQ

What is Depth Anything 3 Large best for?

It excels at multi-view depth estimation and camera pose estimation, outperforming prior models like VGGT and Depth Anything 2 on publicly benchmarked tasks.

How does it compare to Depth Anything 2?

Depth Anything 3 Large significantly outperforms Depth Anything 2 for monocular depth estimation and adds multi-view and pose estimation capabilities.

What license does it use?

It is licensed under CC BY-NC 4.0, allowing non-commercial use with attribution.

What input format does the model expect?

It accepts a list of image paths, PIL Images, or numpy arrays. For the API, standard image formats (JPEG, PNG) are supported.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible gigarouter endpoint with your API key, passing the model ID and input images in the request payload.

not yet live

We're benchmarking and onboarding Depth Anything 3 Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo