skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Lotus Depth G V1-0

jingheya/lotus-depth-g-v1-0

published Oct 2024 · updated Oct 2024

Lotus Depth G V1-0 is a diffusion-based depth estimation model that directly predicts annotations in a single step, achieving high-quality zero-shot depth estimation.

status
coming soon
API providers
0
downloads / mo
8K
license
apache-2.0

specs

TaskDepth Estimation
ArchitectureDiffusion-based (x₀-prediction, single-step reformulation, Detail Preserver)
ParametersNot specified in card
LicenseNot specified in card

about this model

Lotus-depth-g-v1-0 is a diffusion-based visual foundation model for high-quality monocular depth estimation, hosted as a managed API on gigarouter. The model is part of the Lotus family, accepted at ICLR 2025, and reformulates the diffusion process to directly predict depth annotations (x₀-prediction) instead of noise (ε-prediction), which the authors demonstrate is harmful for dense prediction. This single-step diffusion procedure at time-step t=T simplifies optimization and significantly boosts inference speed compared to existing diffusion-based methods.

Key Strengths

  • Zero-shot generalization: Achieves state-of-the-art performance in zero-shot depth estimation across diverse datasets without scaling training data or model capacity.
  • Efficiency: The single-step reformulation makes Lotus substantially faster than most diffusion-based dense prediction models.
  • Detail Preserver mechanism: A novel tuning strategy (a switcher s) that toggles between image reconstruction and dense prediction, enabling more accurate and fine-grained depth maps.
  • Data efficiency: Trained on only 59,000 images, compared to 62.6 million images used by DepthAnything.

Benchmark Results

In zero-shot depth estimation, the generative Lotus-G variant outperforms all other methods. The discriminative Lotus-D variant is slightly inferior to DepthAnything but still competitive. The model leverages Stable Diffusion's pre-trained VAE encoder and denoiser U-Net as its backbone.

Visual Examples

The following images illustrate the model's depth estimation quality:

Depth estimation example 1 showing dense prediction output Depth estimation example 2 showing fine-grained depth map

best for

FAQ

What makes Lotus Depth G different from other depth models?

It uses a diffusion-based approach with x₀-prediction (predicting annotations directly instead of noise) and a single-step reformulation for faster inference.

What input format does the model expect?

The model takes a single RGB image as input.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key and the model ID jingheya/lotus-depth-g-v1-0.

Is this model suitable for real-time applications?

Yes, the single-step diffusion reformulation significantly boosts inference speed compared to multi-step diffusion methods.

What is the license for this model?

The license is not specified in the model card.

not yet live

We're benchmarking and onboarding Lotus Depth G V1-0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →