Depth Anything 3 Small

depth-anything/DA3-SMALL

published Nov 2025 · updated Nov 2025

Depth Anything 3 Small is a multi-view depth estimation and camera pose estimation model that uses a unified depth-ray representation in a plain transformer.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

41.2K

license

apache-2.0

specs

Task	Multi-view Depth Estimation & Camera Pose Estimation
Architecture	Plain Transformer (Vision Transformer) with unified depth-ray representation
Parameters	0.08B
License	Apache 2.0

about this model

Depth Anything 3 DA3-Small is a multi-view depth estimation and camera pose estimation model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. It uses a single plain transformer backbone with a unified depth-ray representation, trained exclusively on public academic datasets.

Key Capabilities

Relative depth estimation from single or multiple images
Camera pose estimation (extrinsics and intrinsics)
Pose conditioning for geometry-aware inference
Feed-forward 3D Gaussian estimation for novel view synthesis
Multi-camera spatial perception (e.g., autonomous driving setups with non-overlapping views)

Performance

DA3-Small significantly outperforms Depth Anything 2 for monocular depth estimation and VGGT for multi-view depth and pose estimation. On the project page, DA3 surpasses VGGT by 35.7% in camera pose accuracy and 23.6% in geometric accuracy (arXiv reports 44.3% and 25.1% respectively). The model also reduces drift in large-scale SLAM applications, outperforming COLMAP (which requires over 48 hours).

Architecture and Training

DA3-Small has 0.08B parameters and uses a teacher-student training paradigm to achieve detail and generalization on par with DA2. A DPT head can be trained on the frozen backbone to predict 3D Gaussian parameters for generalizable novel view synthesis. The model supports DA3-Streaming for ultra-long video sequences using under 12GB GPU memory via sliding-window inference.

Benchmarks

The paper introduces a visual geometry benchmark covering camera pose estimation, any-view geometry, and visual rendering. A benchmark evaluation pipeline is released for pose estimation and 3D reconstruction on five datasets. The model was accepted as an ICLR 2026 Oral.

Limitations

Trained on academic datasets; performance may vary on domain-specific images
Results depend on image quality, lighting conditions, and scene complexity

best for

·Multi-view depth estimation from arbitrary images
·Camera pose estimation without known poses
·Feed-forward 3D Gaussian estimation for novel view synthesis

FAQ

What does Depth Anything 3 Small do?

It predicts spatially consistent depth maps and camera poses from one or more images, using a unified depth-ray representation.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. Send a request with image URLs or base64-encoded images.

What are the input and output formats?

Input: list of image paths, PIL images, or numpy arrays. Output: depth maps (float32), confidence maps, camera extrinsics and intrinsics (float32), and optional 3D export in GLB, PLY, or NPZ.

Is this model free to use?

Yes, the model is licensed under Apache 2.0. The hosted API on gigarouter may have usage costs; check gigarouter pricing.

How does Depth Anything 3 Small compare to Depth Anything 2?

DA3 significantly outperforms DA2 for monocular depth estimation, and also surpasses VGGT for multi-view depth and pose estimation. DA3-Small has 0.08B parameters.

not yet live

We're benchmarking and onboarding Depth Anything 3 Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo