skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything 3 Small

depth-anything/DA3-SMALL

published Nov 2025 · updated Nov 2025

Depth Anything 3 Small is a multi-view depth estimation and camera pose estimation model that uses a unified depth-ray representation in a plain transformer.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
41.2K
license
apache-2.0

specs

TaskMulti-view Depth Estimation & Camera Pose Estimation
ArchitecturePlain Transformer (Vision Transformer) with unified depth-ray representation
Parameters0.08B
LicenseApache 2.0

about this model

Depth Anything 3 DA3-Small is a multi-view depth estimation and camera pose estimation model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. It uses a single plain transformer backbone with a unified depth-ray representation, trained exclusively on public academic datasets.

Key Capabilities

  • Relative depth estimation from single or multiple images
  • Camera pose estimation (extrinsics and intrinsics)
  • Pose conditioning for geometry-aware inference
  • Feed-forward 3D Gaussian estimation for novel view synthesis
  • Multi-camera spatial perception (e.g., autonomous driving setups with non-overlapping views)

Performance

DA3-Small significantly outperforms Depth Anything 2 for monocular depth estimation and VGGT for multi-view depth and pose estimation. On the project page, DA3 surpasses VGGT by 35.7% in camera pose accuracy and 23.6% in geometric accuracy (arXiv reports 44.3% and 25.1% respectively). The model also reduces drift in large-scale SLAM applications, outperforming COLMAP (which requires over 48 hours).

Architecture and Training

DA3-Small has 0.08B parameters and uses a teacher-student training paradigm to achieve detail and generalization on par with DA2. A DPT head can be trained on the frozen backbone to predict 3D Gaussian parameters for generalizable novel view synthesis. The model supports DA3-Streaming for ultra-long video sequences using under 12GB GPU memory via sliding-window inference.

Benchmarks

The paper introduces a visual geometry benchmark covering camera pose estimation, any-view geometry, and visual rendering. A benchmark evaluation pipeline is released for pose estimation and 3D reconstruction on five datasets. The model was accepted as an ICLR 2026 Oral.

Limitations

  • Trained on academic datasets; performance may vary on domain-specific images
  • Results depend on image quality, lighting conditions, and scene complexity

best for

FAQ

What does Depth Anything 3 Small do?

It predicts spatially consistent depth maps and camera poses from one or more images, using a unified depth-ray representation.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. Send a request with image URLs or base64-encoded images.

What are the input and output formats?

Input: list of image paths, PIL images, or numpy arrays. Output: depth maps (float32), confidence maps, camera extrinsics and intrinsics (float32), and optional 3D export in GLB, PLY, or NPZ.

Is this model free to use?

Yes, the model is licensed under Apache 2.0. The hosted API on gigarouter may have usage costs; check gigarouter pricing.

How does Depth Anything 3 Small compare to Depth Anything 2?

DA3 significantly outperforms DA2 for monocular depth estimation, and also surpasses VGGT for multi-view depth and pose estimation. DA3-Small has 0.08B parameters.

not yet live

We're benchmarking and onboarding Depth Anything 3 Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →