Marigold Depth v1-0

prs-eth/marigold-depth-v1-0

published Dec 2023 · updated May 2025

Marigold Depth v1-0 is a generative latent diffusion model for affine-invariant monocular depth estimation from a single image.

status

coming soon

API providers

downloads / mo

72.8K

license

apache-2.0

specs

Task	Monocular Depth Estimation
Architecture	Latent Diffusion (fine-tuned from Stable Diffusion 2)
License	Apache 2.0

about this model

Marigold Depth v1-0 is a generative latent diffusion-based monocular depth estimation model that produces affine-invariant depth maps from a single image. It is fine-tuned from Stable Diffusion 2, retaining the rich visual priors of the base model to achieve strong zero-shot generalization across diverse domains.

Key Capabilities

Accepts images of any resolution; optimal results are obtained when the longer side is resized to approximately 768 pixels.
Designed for the DDIM scheduler with 10–50 denoising steps.
Outputs a depth map with values between 0 and 1 (near-to-far planes) and, when ensembling more than two predictions, an uncertainty map.

Performance and Recognition

Delivers state-of-the-art monocular depth estimation, with over 20% performance gains on specific datasets compared to prior methods (source: arXiv:2312.02145).
The original Marigold depth paper was selected as an Oral presentation and Best Paper Award Candidate at CVPR 2024 (source: project README).
The model can be fine-tuned in a couple of days on a single GPU using only synthetic training data (source: arXiv:2312.02145).

Model Details

Developed by Bingxin Ke, Anton Obukhov, and colleagues at ETH Zurich. Licensed under Apache 2.0. The model is part of the broader Marigold family, which also includes surface normals estimation and intrinsic image decomposition. This v1-0 checkpoint is hosted on gigarouter as a managed API, providing OpenAI-compatible endpoints for depth estimation without requiring local installation or hardware management.

best for

·Zero-shot depth estimation on diverse outdoor and indoor scenes
·Generating depth maps for image editing or 3D reconstruction
·Affordable adaptation with training on a single GPU using synthetic data

FAQ

What input format does Marigold Depth v1-0 require?

It accepts a single image at any resolution; optimal results are achieved when the longer side is resized to 768 pixels.

What output does the model produce?

It outputs an affine-invariant depth map with values between 0 and 1, and optionally an uncertainty map when ensembling multiple predictions with ensemble size larger than 2.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; refer to the gigarouter documentation for exact endpoint details.

What is the license of Marigold Depth v1-0?

The model is released under the Apache License 2.0.

How does it compare to other monocular depth estimators?

It achieves state-of-the-art zero-shot generalization and reports over 20% performance improvement on specific datasets compared to previous methods (per the original paper).

not yet live

We're benchmarking and onboarding Marigold Depth v1-0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo