Marigold Depth v1-0
prs-eth/marigold-depth-v1-0
published Dec 2023 · updated May 2025
Marigold Depth v1-0 is a generative latent diffusion model for affine-invariant monocular depth estimation from a single image.
specs
| Task | Monocular Depth Estimation |
| Architecture | Latent Diffusion (fine-tuned from Stable Diffusion 2) |
| License | Apache 2.0 |
about this model
Marigold Depth v1-0 is a generative latent diffusion-based monocular depth estimation model that produces affine-invariant depth maps from a single image. It is fine-tuned from Stable Diffusion 2, retaining the rich visual priors of the base model to achieve strong zero-shot generalization across diverse domains.
Key Capabilities
- Accepts images of any resolution; optimal results are obtained when the longer side is resized to approximately 768 pixels.
- Designed for the DDIM scheduler with 10–50 denoising steps.
- Outputs a depth map with values between 0 and 1 (near-to-far planes) and, when ensembling more than two predictions, an uncertainty map.
Performance and Recognition
- Delivers state-of-the-art monocular depth estimation, with over 20% performance gains on specific datasets compared to prior methods (source: arXiv:2312.02145).
- The original Marigold depth paper was selected as an Oral presentation and Best Paper Award Candidate at CVPR 2024 (source: project README).
- The model can be fine-tuned in a couple of days on a single GPU using only synthetic training data (source: arXiv:2312.02145).
Model Details
Developed by Bingxin Ke, Anton Obukhov, and colleagues at ETH Zurich. Licensed under Apache 2.0. The model is part of the broader Marigold family, which also includes surface normals estimation and intrinsic image decomposition. This v1-0 checkpoint is hosted on gigarouter as a managed API, providing OpenAI-compatible endpoints for depth estimation without requiring local installation or hardware management.
best for
- ·Zero-shot depth estimation on diverse outdoor and indoor scenes
- ·Generating depth maps for image editing or 3D reconstruction
- ·Affordable adaptation with training on a single GPU using synthetic data
FAQ
It accepts a single image at any resolution; optimal results are achieved when the longer side is resized to 768 pixels.
It outputs an affine-invariant depth map with values between 0 and 1, and optionally an uncertainty map when ensembling multiple predictions with ensemble size larger than 2.
Use the gigarouter OpenAI-compatible endpoint with your API key; refer to the gigarouter documentation for exact endpoint details.
The model is released under the Apache License 2.0.
It achieves state-of-the-art zero-shot generalization and reports over 20% performance improvement on specific datasets compared to previous methods (per the original paper).
We're benchmarking and onboarding Marigold Depth v1-0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.