skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything 3

mudler/depth-anything.cpp-gguf

published Jun 2026 · updated Jun 2026

Depth Anything 3 is a depth estimation model that recovers dense metric depth maps, per-pixel confidence, camera pose, and 3D point clouds from a single image using a C++/ggml engine with GGUF weights.

status
coming soon
API providers
0
downloads / mo
3.6K
license
apache-2.0

specs

TaskDepth Estimation
ArchitectureVision Transformer (ViT-S, ViT-B, ViT-L, ViT-g) with DPT head
Quantizationf32, f16, q8_0, q6_k, q5_k, q4_k
OutputsDepth map, confidence, camera extrinsics/intrinsics, sky mask, 3D point cloud, GLB/COLMAP/PLY export
LicenseApache-2.0 (weights), MIT (engine)

about this model

Depth Anything 3 (GGUF) is a monocular depth estimation model that recovers a dense metric depth map, per-pixel confidence, camera extrinsics (3×4) and intrinsics (3×3), an optional sky mask, a 3D point cloud, and exports to glb, COLMAP, or PLY from a single image.

Key strengths

  • Faster than PyTorch on CPU – up to 1.31× speedup (q8_0 quant) at 504×336, with 0.46× the peak memory (614 MB vs 1,328 MB).
  • Bit-exact against the reference – correlation 1.0 verified component by component.
  • Self-contained GGUF files – all hyperparameters and preprocessing constants baked in; no external config needed.
  • Quantised variants (q4_k down to 99 MB) are near-lossless (correlation > 0.999 for most).
  • Supports the full Depth Anything 3 family (ViT-S/B/L/g), including metric and mono variants, plus Depth Anything V2 models with metric indoor (≤20 m) and outdoor (≤80 m) depth.

Benchmark performance (504×336, AMD Ryzen 9 9950X3D, threads=16)

EngineQuantModel sizeInferencePeak RAMVs PyTorch
PyTorchf32516 MB416.9 ms1,328 MB1.00×
C++/ggmlf32393 MB346.4 ms614 MB1.20×
C++/ggmlq8_0142 MB319.4 ms363 MB1.31×
C++/ggmlq4_k99 MB395.2 ms320 MB1.05×

Additional benchmarks at 224×224 show similar speedups (f32 1.13×, q8_0 1.19×). Depth+pose inference adds ~7–66 ms overhead depending on quant.

License

Weights are Apache-2.0; the underlying inference engine is MIT.

best for

FAQ

What output formats does the model provide?

Depth map, confidence map, camera extrinsics (3x4), intrinsics (3x3), sky mask, and a back-projected 3D point cloud with export to GLB, COLMAP, or PLY.

What input does the model require?

A single RGB image. The engine handles preprocessing automatically from metadata baked into the GGUF file.

What license applies to the weights?

The GGUF weights inherit Apache-2.0 from the official Depth Anything 3 checkpoints.

How does this model compare in speed to the original PyTorch implementation?

The C++/ggml engine is faster on CPU (up to 1.31x at q8_0) and uses half or less memory (peak RAM 320–614 MB vs PyTorch 1328 MB).

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. Send an image file (e.g., base64 or URL) and receive depth results in the response format.

not yet live

We're benchmarking and onboarding Depth Anything 3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →