Depth Anything 3 Large

depth-anything/DA3-LARGE-1.1

published Dec 2025 · updated Dec 2025

Depth Anything 3 Large is a depth model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses, using a unified depth-ray representation.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

57K

license

apache-2.0

specs

Task	Multi-view depth estimation and camera pose estimation
Architecture	Plain transformer (vanilla DINO encoder) with unified depth-ray representation
Parameters	0.35B
License	Apache 2.0

about this model

Depth Anything 3 DA3-LARGE is a multi-view depth estimation and camera pose estimation model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. It uses a single plain transformer backbone with a unified depth-ray representation, eliminating the need for architectural specialization or complex multi-task learning.

Capabilities

Relative depth estimation from single or multiple images
Camera pose estimation (extrinsics and intrinsics) from unordered image sets
Pose conditioning for geometry-aware inference

Key strengths

Trained exclusively on public academic datasets, DA3-LARGE (0.35B parameters) significantly outperforms prior state-of-the-art models. According to its published paper (arXiv:2511.10647), it surpasses VGGT by 44.3% in camera pose accuracy and 25.1% in geometric accuracy. It also substantially improves over Depth Anything 2 for monocular depth estimation. The work has been accepted as an ICLR 2026 Oral presentation.

Architecture insights

The model demonstrates that a single plain transformer (e.g., vanilla DINO encoder) suffices as a backbone, and a singular depth-ray representation replaces the need for multi-task learning—both insights contribute to its efficiency and performance.

Visual results

Limitations

Trained on academic datasets; performance may degrade on domain-specific images
Output quality depends on image quality, lighting, and scene complexity

References

Project page · Paper

best for

·Multi-view depth estimation from unordered image sets
·Camera pose estimation without known poses
·Monocular relative depth estimation

FAQ

What is Depth Anything 3 Large?

It is a foundation model for multi-view depth estimation and camera pose estimation, using a unified depth-ray representation and a plain transformer backbone.

What are the input and output formats?

Input: list of image paths, PIL images, or numpy arrays. Output: depth maps, confidence maps, camera extrinsics (w2c), and intrinsics — all as float32 tensors.

How many parameters does the model have?

0.35 billion (0.35B) parameters.

What license is the model released under?

Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key and the model name "depth-anything/DA3-LARGE-1.1".

not yet live

We're benchmarking and onboarding Depth Anything 3 Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo