skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything 3 Large

depth-anything/DA3-LARGE-1.1

published Dec 2025 · updated Dec 2025

Depth Anything 3 Large is a depth model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses, using a unified depth-ray representation.

est. price
~$0.094
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
57K
license
apache-2.0

specs

TaskMulti-view depth estimation and camera pose estimation
ArchitecturePlain transformer (vanilla DINO encoder) with unified depth-ray representation
Parameters0.35B
LicenseApache 2.0

about this model

Depth Anything 3 DA3-LARGE is a multi-view depth estimation and camera pose estimation model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. It uses a single plain transformer backbone with a unified depth-ray representation, eliminating the need for architectural specialization or complex multi-task learning.

Capabilities

  • Relative depth estimation from single or multiple images
  • Camera pose estimation (extrinsics and intrinsics) from unordered image sets
  • Pose conditioning for geometry-aware inference

Key strengths

Trained exclusively on public academic datasets, DA3-LARGE (0.35B parameters) significantly outperforms prior state-of-the-art models. According to its published paper (arXiv:2511.10647), it surpasses VGGT by 44.3% in camera pose accuracy and 25.1% in geometric accuracy. It also substantially improves over Depth Anything 2 for monocular depth estimation. The work has been accepted as an ICLR 2026 Oral presentation.

Architecture insights

The model demonstrates that a single plain transformer (e.g., vanilla DINO encoder) suffices as a backbone, and a singular depth-ray representation replaces the need for multi-task learning—both insights contribute to its efficiency and performance.

Visual results

Limitations

  • Trained on academic datasets; performance may degrade on domain-specific images
  • Output quality depends on image quality, lighting, and scene complexity

References

Project page · Paper

best for

FAQ

What is Depth Anything 3 Large?

It is a foundation model for multi-view depth estimation and camera pose estimation, using a unified depth-ray representation and a plain transformer backbone.

What are the input and output formats?

Input: list of image paths, PIL images, or numpy arrays. Output: depth maps, confidence maps, camera extrinsics (w2c), and intrinsics — all as float32 tensors.

How many parameters does the model have?

0.35 billion (0.35B) parameters.

What license is the model released under?

Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key and the model name "depth-anything/DA3-LARGE-1.1".

not yet live

We're benchmarking and onboarding Depth Anything 3 Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →