DepthPro
apple/DepthPro-hf
published Nov 2024 · updated Feb 2025
DepthPro is a depth model that produces high-resolution metric depth maps from a single image without requiring camera metadata.
specs
| Task | Monocular Depth Estimation |
| Architecture | Multi-scale Vision Transformer (ViT) with DPT-like fusion |
| Input Resolution | 1536x1536 pixels |
| Speed | 0.3 seconds for a 2.25-megapixel depth map on a standard GPU |
| License | Apple-ASCL |
about this model
DepthPro-hf is a foundation model for zero-shot metric monocular depth estimation that generates high-resolution depth maps with absolute scale without requiring camera intrinsics. Developed by Apple and published at ICLR 2025, it produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
The model employs a multi-scale Vision Transformer (ViT) architecture. Images are downsampled into patches processed by a shared Dinov2 encoder, then merged and refined through a DPT-like fusion stage. A separate head estimates focal length from a single image. Training combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing.
Training Data
The model was trained on the following datasets:
Training Hyperparameters
Evaluation
Depth Pro outperforms prior work along multiple dimensions, including metric accuracy, boundary sharpness, and focal length estimation. The model hosted on gigarouter is based on the reference implementation, which has been re-trained; its performance is close to, but does not exactly match, the results reported in the paper.
Architecture
Additional technical details are available in the paper and the official repository.
best for
- ·Generating metric depth maps for 3D reconstruction
- ·Estimating absolute scale depth from a single photo without camera intrinsics
- ·Focal length estimation from a single image
FAQ
An RGB image resized and normalized to 1536x1536 pixels with mean=[0.5,0.5,0.5] and std=[0.5,0.5,0.5].
It outputs a metric depth map (in meters) and an estimate of the focal length (or field of view).
The model is released under the Apple-ASCL license.
Yes, it is a foundation model for zero-shot metric monocular depth estimation, trained on mixed real and synthetic datasets.
Use the gigarouter OpenAI-compatible endpoint with your API key.
We're benchmarking and onboarding DepthPro as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.