skip to content
gigarouter gigarouter
models / depth estimation · coming soon

DepthPro

apple/DepthPro-hf

published Nov 2024 · updated Feb 2025

DepthPro is a depth model that produces high-resolution metric depth maps from a single image without requiring camera metadata.

est. price
~$0.235
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
23.3K
license
apple-amlr

specs

TaskMonocular Depth Estimation
ArchitectureMulti-scale Vision Transformer (ViT) with DPT-like fusion
Input Resolution1536x1536 pixels
Speed0.3 seconds for a 2.25-megapixel depth map on a standard GPU
LicenseApple-ASCL

about this model

DepthPro-hf is a foundation model for zero-shot metric monocular depth estimation that generates high-resolution depth maps with absolute scale without requiring camera intrinsics. Developed by Apple and published at ICLR 2025, it produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.

Example depth map produced by DepthPro showing sharp boundaries and fine details

The model employs a multi-scale Vision Transformer (ViT) architecture. Images are downsampled into patches processed by a shared Dinov2 encoder, then merged and refined through a DPT-like fusion stage. A separate head estimates focal length from a single image. Training combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing.

Training Data

The model was trained on the following datasets:

Table of training datasets used for DepthPro

Training Hyperparameters

Training hyperparameters for DepthPro

Evaluation

Evaluation results comparing DepthPro to prior methods across multiple depth estimation benchmarks

Depth Pro outperforms prior work along multiple dimensions, including metric accuracy, boundary sharpness, and focal length estimation. The model hosted on gigarouter is based on the reference implementation, which has been re-trained; its performance is close to, but does not exactly match, the results reported in the paper.

Architecture

Diagram of DepthPro's multi-scale encoder and fusion architecture

Additional technical details are available in the paper and the official repository.

best for

FAQ

What input format does the model expect?

An RGB image resized and normalized to 1536x1536 pixels with mean=[0.5,0.5,0.5] and std=[0.5,0.5,0.5].

What does the model output?

It outputs a metric depth map (in meters) and an estimate of the focal length (or field of view).

What license is used?

The model is released under the Apple-ASCL license.

Is DepthPro zero-shot?

Yes, it is a foundation model for zero-shot metric monocular depth estimation, trained on mixed real and synthetic datasets.

How can I call this model via an API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

not yet live

We're benchmarking and onboarding DepthPro as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →