Lotus Depth G V1-0

jingheya/lotus-depth-g-v1-0

published Oct 2024 · updated Oct 2024

Lotus Depth G V1-0 is a diffusion-based depth estimation model that directly predicts annotations in a single step, achieving high-quality zero-shot depth estimation.

status

coming soon

API providers

downloads / mo

license

apache-2.0

specs

Task	Depth Estimation
Architecture	Diffusion-based (x₀-prediction, single-step reformulation, Detail Preserver)
Parameters	Not specified in card
License	Not specified in card

about this model

Lotus-depth-g-v1-0 is a diffusion-based visual foundation model for high-quality monocular depth estimation, hosted as a managed API on gigarouter. The model is part of the Lotus family, accepted at ICLR 2025, and reformulates the diffusion process to directly predict depth annotations (x₀-prediction) instead of noise (ε-prediction), which the authors demonstrate is harmful for dense prediction. This single-step diffusion procedure at time-step t=T simplifies optimization and significantly boosts inference speed compared to existing diffusion-based methods.

Key Strengths

Zero-shot generalization: Achieves state-of-the-art performance in zero-shot depth estimation across diverse datasets without scaling training data or model capacity.
Efficiency: The single-step reformulation makes Lotus substantially faster than most diffusion-based dense prediction models.
Detail Preserver mechanism: A novel tuning strategy (a switcher s) that toggles between image reconstruction and dense prediction, enabling more accurate and fine-grained depth maps.
Data efficiency: Trained on only 59,000 images, compared to 62.6 million images used by DepthAnything.

Benchmark Results

In zero-shot depth estimation, the generative Lotus-G variant outperforms all other methods. The discriminative Lotus-D variant is slightly inferior to DepthAnything but still competitive. The model leverages Stable Diffusion's pre-trained VAE encoder and denoiser U-Net as its backbone.

Visual Examples

The following images illustrate the model's depth estimation quality:

Depth estimation example 1 showing dense prediction output

Depth estimation example 2 showing fine-grained depth map

best for

·Zero-shot monocular depth estimation from a single image
·High-quality depth prediction for downstream 3D reconstruction tasks

FAQ

What makes Lotus Depth G different from other depth models?

It uses a diffusion-based approach with x₀-prediction (predicting annotations directly instead of noise) and a single-step reformulation for faster inference.

What input format does the model expect?

The model takes a single RGB image as input.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key and the model ID jingheya/lotus-depth-g-v1-0.

Is this model suitable for real-time applications?

Yes, the single-step diffusion reformulation significantly boosts inference speed compared to multi-step diffusion methods.

What is the license for this model?

The license is not specified in the model card.

not yet live

We're benchmarking and onboarding Lotus Depth G V1-0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo