Marigold Depth v1-1

prs-eth/marigold-depth-v1-1

published Dec 2024 · updated May 2025

Marigold Depth v1-1 is a generative latent diffusion model for affine-invariant monocular depth estimation from a single image, fine-tuned from Stable Diffusion 2.

status

coming soon

API providers

downloads / mo

4.6K

license

openrail++

specs

Task	Monocular depth estimation
Architecture	Latent diffusion (fine-tuned from Stable Diffusion 2)
License	CreativeML Open RAIL++-M

about this model

Marigold-Depth-v1-1 is a generative latent diffusion-based model for affine-invariant monocular depth estimation from a single image. It is fine-tuned from Stable Diffusion 2 and retains its strong visual priors, enabling state-of-the-art zero-shot generalization across diverse domains without additional training data.

Technical Details

Output: An affine-invariant depth map with values between 0 and 1, representing relative near-to-far ordering. When ensembling multiple predictions (ensemble size larger than 2), an uncertainty map is also produced.
Resolution: Designed for an effective resolution of approximately 768 pixels on the longer side. For optimal results, larger inputs should be resized accordingly.
Scheduler and Steps: Works with the DDIM scheduler using between 1 and 50 denoising steps.
Training: Fine-tuned exclusively on synthetic data over a few days on a single GPU, as described in the CVPR 2024 paper and the journal extension by Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler.

Performance

Marigold achieves state-of-the-art zero-shot performance across a wide range of benchmark datasets. In specific cases, it delivers over 20% improvement over prior methods. This model is hosted by gigarouter as a managed, OpenAI-compatible API for easy integration.

best for

·Single-image depth estimation for zero-shot generalization to novel scenes
·Generating affine-invariant depth maps from real-world photos

FAQ

What is the recommended input resolution?

The model works best when the longer side of the input image is resized to 768 pixels, as it inherits the base diffusion model's effective resolution of roughly 768 pixels.

How many denoising steps are supported?

The model is designed for the DDIM scheduler and uses between 1 and 50 denoising steps.

What is the output format?

It outputs an affine-invariant depth map with values between 0 and 1, and optionally an uncertainty map when ensembling with more than 2 predictions.

What license does this model use?

It is released under the CreativeML Open RAIL++-M License.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; send an image as input and receive the depth map in response.

not yet live

We're benchmarking and onboarding Marigold Depth v1-1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo