Depth Anything 3
mudler/depth-anything.cpp-gguf
published Jun 2026 · updated Jun 2026
Depth Anything 3 is a depth estimation model that recovers dense metric depth maps, per-pixel confidence, camera pose, and 3D point clouds from a single image using a C++/ggml engine with GGUF weights.
specs
| Task | Depth Estimation |
| Architecture | Vision Transformer (ViT-S, ViT-B, ViT-L, ViT-g) with DPT head |
| Quantization | f32, f16, q8_0, q6_k, q5_k, q4_k |
| Outputs | Depth map, confidence, camera extrinsics/intrinsics, sky mask, 3D point cloud, GLB/COLMAP/PLY export |
| License | Apache-2.0 (weights), MIT (engine) |
about this model
Depth Anything 3 (GGUF) is a monocular depth estimation model that recovers a dense metric depth map, per-pixel confidence, camera extrinsics (3×4) and intrinsics (3×3), an optional sky mask, a 3D point cloud, and exports to glb, COLMAP, or PLY from a single image.
Key strengths
- Faster than PyTorch on CPU – up to 1.31× speedup (q8_0 quant) at 504×336, with 0.46× the peak memory (614 MB vs 1,328 MB).
- Bit-exact against the reference – correlation 1.0 verified component by component.
- Self-contained GGUF files – all hyperparameters and preprocessing constants baked in; no external config needed.
- Quantised variants (q4_k down to 99 MB) are near-lossless (correlation > 0.999 for most).
- Supports the full Depth Anything 3 family (ViT-S/B/L/g), including metric and mono variants, plus Depth Anything V2 models with metric indoor (≤20 m) and outdoor (≤80 m) depth.
Benchmark performance (504×336, AMD Ryzen 9 9950X3D, threads=16)
| Engine | Quant | Model size | Inference | Peak RAM | Vs PyTorch |
|---|---|---|---|---|---|
| PyTorch | f32 | 516 MB | 416.9 ms | 1,328 MB | 1.00× |
| C++/ggml | f32 | 393 MB | 346.4 ms | 614 MB | 1.20× |
| C++/ggml | q8_0 | 142 MB | 319.4 ms | 363 MB | 1.31× |
| C++/ggml | q4_k | 99 MB | 395.2 ms | 320 MB | 1.05× |
Additional benchmarks at 224×224 show similar speedups (f32 1.13×, q8_0 1.19×). Depth+pose inference adds ~7–66 ms overhead depending on quant.
License
Weights are Apache-2.0; the underlying inference engine is MIT.
best for
- ·Single-image metric depth estimation for 3D reconstruction
- ·Camera pose and intrinsic estimation from a single view
- ·Point cloud generation for 3D modeling and asset creation
- ·Indoor/outdoor scene depth mapping on CPU without GPU
FAQ
Depth map, confidence map, camera extrinsics (3x4), intrinsics (3x3), sky mask, and a back-projected 3D point cloud with export to GLB, COLMAP, or PLY.
A single RGB image. The engine handles preprocessing automatically from metadata baked into the GGUF file.
The GGUF weights inherit Apache-2.0 from the official Depth Anything 3 checkpoints.
The C++/ggml engine is faster on CPU (up to 1.31x at q8_0) and uses half or less memory (peak RAM 320–614 MB vs PyTorch 1328 MB).
Use the OpenAI-compatible endpoint with your API key. Send an image file (e.g., base64 or URL) and receive depth results in the response format.
We're benchmarking and onboarding Depth Anything 3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.