Depth Anything 3

mudler/depth-anything.cpp-gguf

published Jun 2026 · updated Jun 2026

Depth Anything 3 is a depth estimation model that recovers dense metric depth maps, per-pixel confidence, camera pose, and 3D point clouds from a single image using a C++/ggml engine with GGUF weights.

status

coming soon

API providers

downloads / mo

3.6K

license

apache-2.0

specs

Task	Depth Estimation
Architecture	Vision Transformer (ViT-S, ViT-B, ViT-L, ViT-g) with DPT head
Quantization	f32, f16, q8_0, q6_k, q5_k, q4_k
Outputs	Depth map, confidence, camera extrinsics/intrinsics, sky mask, 3D point cloud, GLB/COLMAP/PLY export
License	Apache-2.0 (weights), MIT (engine)

about this model

Depth Anything 3 (GGUF) is a monocular depth estimation model that recovers a dense metric depth map, per-pixel confidence, camera extrinsics (3×4) and intrinsics (3×3), an optional sky mask, a 3D point cloud, and exports to glb, COLMAP, or PLY from a single image.

Key strengths

Faster than PyTorch on CPU – up to 1.31× speedup (q8_0 quant) at 504×336, with 0.46× the peak memory (614 MB vs 1,328 MB).
Bit-exact against the reference – correlation 1.0 verified component by component.
Self-contained GGUF files – all hyperparameters and preprocessing constants baked in; no external config needed.
Quantised variants (q4_k down to 99 MB) are near-lossless (correlation > 0.999 for most).
Supports the full Depth Anything 3 family (ViT-S/B/L/g), including metric and mono variants, plus Depth Anything V2 models with metric indoor (≤20 m) and outdoor (≤80 m) depth.

Benchmark performance (504×336, AMD Ryzen 9 9950X3D, threads=16)

Engine	Quant	Model size	Inference	Peak RAM	Vs PyTorch
PyTorch	f32	516 MB	416.9 ms	1,328 MB	1.00×
C++/ggml	f32	393 MB	346.4 ms	614 MB	1.20×
C++/ggml	q8_0	142 MB	319.4 ms	363 MB	1.31×
C++/ggml	q4_k	99 MB	395.2 ms	320 MB	1.05×

Additional benchmarks at 224×224 show similar speedups (f32 1.13×, q8_0 1.19×). Depth+pose inference adds ~7–66 ms overhead depending on quant.

License

Weights are Apache-2.0; the underlying inference engine is MIT.

best for

·Single-image metric depth estimation for 3D reconstruction
·Camera pose and intrinsic estimation from a single view
·Point cloud generation for 3D modeling and asset creation
·Indoor/outdoor scene depth mapping on CPU without GPU

FAQ

What output formats does the model provide?

Depth map, confidence map, camera extrinsics (3x4), intrinsics (3x3), sky mask, and a back-projected 3D point cloud with export to GLB, COLMAP, or PLY.

What input does the model require?

A single RGB image. The engine handles preprocessing automatically from metadata baked into the GGUF file.

What license applies to the weights?

The GGUF weights inherit Apache-2.0 from the official Depth Anything 3 checkpoints.

How does this model compare in speed to the original PyTorch implementation?

The C++/ggml engine is faster on CPU (up to 1.31x at q8_0) and uses half or less memory (peak RAM 320–614 MB vs PyTorch 1328 MB).

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. Send an image file (e.g., base64 or URL) and receive depth results in the response format.

not yet live

We're benchmarking and onboarding Depth Anything 3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo