skip to content
gigarouter gigarouter
models / depth estimation · coming soon

DepthCrafter

tencent/DepthCrafter

published Sep 2024 · updated Jul 2025

DepthCrafter is a depth model that generates temporally consistent long depth sequences for open-world videos.

status
coming soon
API providers
0
downloads / mo
6.8K
license
other

specs

TaskVideo Depth Estimation
ArchitectureVideo-to-depth diffusion model fine-tuned from a pre-trained image-to-video diffusion model
Max Sequence Length110 frames per generation
Training DataRealistic and synthetic video datasets
Inference StrategyOverlapped segment estimation with latent interpolation for arbitrarily long videos

about this model

DepthCrafter is a video depth estimation model that generates temporally consistent long depth sequences with fine-grained details for open-world videos, without requiring camera poses or optical flow. It is trained from a pre-trained image-to-video diffusion model using a three-stage strategy, enabling variable-length depth sequences up to 110 frames per pass and extreme-length videos through segment-wise estimation with seamless stitching.

Key Strengths

  • State-of-the-art zero-shot performance on open-world video depth estimation, validated across multiple datasets.
  • Selected as a CVPR 2025 Highlight and winner of the Best Paper Award at the PixFoundation workshop.
  • Handles diverse video content, motion, and camera movement without additional input.
  • Optimized inference speed: version 1.0.1 runs at 465.84 ms/frame at 1024×576 resolution (vs. 1913.92 ms/frame for previous version, 180.46 ms for Depth-Anything-V2, 1070.29 ms for Marigold).

Benchmark Accuracy

Absolute Relative Error (AbsRel ↓) and δ₁ accuracy (↑) on four datasets:

DatasetAbsRelδ₁
Sintel0.2700.697
ScanNet0.1230.856
KITTI0.1040.896
Bonn0.0710.972

DepthCrafter v1.0.1 outperforms Marigold, Depth-Anything-V2, and its own previous version on all four datasets in both AbsRel and δ₁.

Output and Integration

  • Supports EXR output format for high-dynamic-range depth maps.
  • Community integrations available for ComfyUI and Nuke.
  • For business licensing inquiries, contact [email protected].

For more visualizations and details, see the project page.

best for

FAQ

What is the maximum video length DepthCrafter can process in a single forward pass?

It can generate depth for up to 110 frames at once. Longer videos are handled by segment-wise estimation with overlapped stitching.

How does DepthCrafter compare to Depth-Anything-V2 for video depth?

Depth-Anything-V2 produces per-frame depth without temporal consistency. DepthCrafter enforces smooth depth across frames and targets open-world video depth under zero-shot settings.

What benchmark performance does DepthCrafter achieve on video depth datasets?

At 1024x576 resolution, v1.0.1 runs at 465.84 ms/frame. Reported AbsRel / d1 on Sintel: 0.270/0.697, Scannet: 0.123/0.856, KITTI: 0.104/0.896, Bonn: 0.071/0.972.

Does DepthCrafter require camera intrinsics or optical flow as input?

No, it requires only RGB video frames. No additional information such as camera poses or optical flow is needed.

How can I call DepthCrafter via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Input is a sequence of video frames; output are depth maps in a format specified by the API.

not yet live

We're benchmarking and onboarding DepthCrafter as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →