models / depth estimation · coming soon

DepthCrafter

tencent/DepthCrafter

published Sep 2024 · updated Jul 2025

DepthCrafter is a depth model that generates temporally consistent long depth sequences for open-world videos.

status

coming soon

API providers

downloads / mo

6.8K

license

other

specs

Task	Video Depth Estimation
Architecture	Video-to-depth diffusion model fine-tuned from a pre-trained image-to-video diffusion model
Max Sequence Length	110 frames per generation
Training Data	Realistic and synthetic video datasets
Inference Strategy	Overlapped segment estimation with latent interpolation for arbitrarily long videos

about this model

DepthCrafter is a video depth estimation model that generates temporally consistent long depth sequences with fine-grained details for open-world videos, without requiring camera poses or optical flow. It is trained from a pre-trained image-to-video diffusion model using a three-stage strategy, enabling variable-length depth sequences up to 110 frames per pass and extreme-length videos through segment-wise estimation with seamless stitching.

Key Strengths

State-of-the-art zero-shot performance on open-world video depth estimation, validated across multiple datasets.
Selected as a CVPR 2025 Highlight and winner of the Best Paper Award at the PixFoundation workshop.
Handles diverse video content, motion, and camera movement without additional input.
Optimized inference speed: version 1.0.1 runs at 465.84 ms/frame at 1024×576 resolution (vs. 1913.92 ms/frame for previous version, 180.46 ms for Depth-Anything-V2, 1070.29 ms for Marigold).

Benchmark Accuracy

Absolute Relative Error (AbsRel ↓) and δ₁ accuracy (↑) on four datasets:

Dataset	AbsRel	δ₁
Sintel	0.270	0.697
ScanNet	0.123	0.856
KITTI	0.104	0.896
Bonn	0.071	0.972

DepthCrafter v1.0.1 outperforms Marigold, Depth-Anything-V2, and its own previous version on all four datasets in both AbsRel and δ₁.

Output and Integration

Supports EXR output format for high-dynamic-range depth maps.
Community integrations available for ComfyUI and Nuke.
For business licensing inquiries, contact [email protected].

For more visualizations and details, see the project page.

best for

·Generating consistent depth for long open-world videos
·Depth-based visual effects such as background replacement and 3D point cloud creation
·Conditional video generation guided by depth maps
·Seamless depth estimation for extremely long videos via segment-wise stitching

FAQ

What is the maximum video length DepthCrafter can process in a single forward pass?

It can generate depth for up to 110 frames at once. Longer videos are handled by segment-wise estimation with overlapped stitching.

How does DepthCrafter compare to Depth-Anything-V2 for video depth?

Depth-Anything-V2 produces per-frame depth without temporal consistency. DepthCrafter enforces smooth depth across frames and targets open-world video depth under zero-shot settings.

What benchmark performance does DepthCrafter achieve on video depth datasets?

At 1024x576 resolution, v1.0.1 runs at 465.84 ms/frame. Reported AbsRel / d1 on Sintel: 0.270/0.697, Scannet: 0.123/0.856, KITTI: 0.104/0.896, Bonn: 0.071/0.972.

Does DepthCrafter require camera intrinsics or optical flow as input?

No, it requires only RGB video frames. No additional information such as camera poses or optical flow is needed.

How can I call DepthCrafter via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Input is a sequence of video frames; output are depth maps in a format specified by the API.

not yet live

We're benchmarking and onboarding DepthCrafter as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo