Depth Pro
apple/DepthPro
published Oct 2024 · updated Feb 2025
Depth Pro is a depth model that produces sharp, metric monocular depth maps with absolute scale, without needing camera intrinsics, in under a second.
specs
| Task | Monocular Depth Estimation |
| Architecture | Multi-scale Vision Transformer |
| License | Custom Apple License |
about this model
The model uses an efficient multi-scale vision transformer for dense prediction and is trained on a combination of real and synthetic datasets to achieve both metric accuracy and fine boundary detail. It also performs state-of-the-art focal length estimation from a single image.
Key capabilities
- Zero-shot metric depth with absolute scale, no camera intrinsics needed
- High-resolution depth maps with sharp boundaries and high-frequency detail
- Fast inference: 0.3 seconds for a 2.25-megapixel image on a V100 GPU
- Integrated focal length estimation from a single image
Benchmark performance
The model was published at ICLR 2025. In qualitative comparisons, Depth Pro outperforms prior work including Marigold, Depth Anything v2, and Metric3D v2 on boundary sharpness and detail preservation. The reference implementation in this repository has been re-trained and its performance is close to, but does not exactly match, the model reported in the paper.
Evaluation metrics
The model introduces dedicated boundary metrics for depth accuracy. These include scale-invariant boundary F1 for depth-based datasets and scale-invariant boundary recall for mask-based datasets (image matting or segmentation).
Additional details
- Code and weights are released under a custom Apple license.
- Example images for boundary and sharpness comparisons use the AM-2k and DIS-5k datasets.
- For full technical details, see the paper: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (arXiv 2410.02073, v2 revised April 2025).
best for
- ·High-resolution depth maps for AR/VR applications
- ·Real-time metric depth from a single image without camera intrinsics
- ·Focal length estimation from a single image
FAQ
Depth Pro excels at zero-shot metric depth estimation from a single image, producing sharp, high-resolution depth maps with absolute scale and fine boundary details.
It produces a 2.25-megapixel depth map in 0.3 seconds on a V100 GPU.
The code and model weights are released under a custom Apple License, not a standard open-source license.
It accepts a single RGB image; the model also optionally takes focal length in pixels (if available) to improve depth accuracy.
Use the gigarouter OpenAI-compatible endpoint with an API key; send the image as input and receive the depth map and focal length as output.
We're benchmarking and onboarding Depth Pro as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.