Marigold Depth v1-1
prs-eth/marigold-depth-v1-1
published Dec 2024 · updated May 2025
Marigold Depth v1-1 is a generative latent diffusion model for affine-invariant monocular depth estimation from a single image, fine-tuned from Stable Diffusion 2.
specs
| Task | Monocular depth estimation |
| Architecture | Latent diffusion (fine-tuned from Stable Diffusion 2) |
| License | CreativeML Open RAIL++-M |
about this model
Marigold-Depth-v1-1 is a generative latent diffusion-based model for affine-invariant monocular depth estimation from a single image. It is fine-tuned from Stable Diffusion 2 and retains its strong visual priors, enabling state-of-the-art zero-shot generalization across diverse domains without additional training data.
Technical Details
- Output: An affine-invariant depth map with values between 0 and 1, representing relative near-to-far ordering. When ensembling multiple predictions (ensemble size larger than 2), an uncertainty map is also produced.
- Resolution: Designed for an effective resolution of approximately 768 pixels on the longer side. For optimal results, larger inputs should be resized accordingly.
- Scheduler and Steps: Works with the DDIM scheduler using between 1 and 50 denoising steps.
- Training: Fine-tuned exclusively on synthetic data over a few days on a single GPU, as described in the CVPR 2024 paper and the journal extension by Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler.
Performance
Marigold achieves state-of-the-art zero-shot performance across a wide range of benchmark datasets. In specific cases, it delivers over 20% improvement over prior methods. This model is hosted by gigarouter as a managed, OpenAI-compatible API for easy integration.
best for
- ·Single-image depth estimation for zero-shot generalization to novel scenes
- ·Generating affine-invariant depth maps from real-world photos
FAQ
The model works best when the longer side of the input image is resized to 768 pixels, as it inherits the base diffusion model's effective resolution of roughly 768 pixels.
The model is designed for the DDIM scheduler and uses between 1 and 50 denoising steps.
It outputs an affine-invariant depth map with values between 0 and 1, and optionally an uncertainty map when ensembling with more than 2 predictions.
It is released under the CreativeML Open RAIL++-M License.
Use the gigarouter OpenAI-compatible endpoint with your API key; send an image as input and receive the depth map in response.
We're benchmarking and onboarding Marigold Depth v1-1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.