Depth Anything Small

LiheYoung/depth-anything-small-hf

published Jan 2024 · updated Jan 2024

Depth Anything Small is a monocular depth estimation model that uses a DPT architecture with a DINOv2 backbone to predict relative depth from a single image.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

20.3K

license

apache-2.0

specs

Task	Monocular Depth Estimation
Architecture	DPT with DINOv2 backbone
Parameters	24.8M
License	Apache 2.0

about this model

LiheYoung/depth-anything-small-hf is a monocular depth estimation model that delivers state-of-the-art zero-shot relative depth predictions using a lightweight DPT architecture with a DINOv2 backbone, trained on over 62 million images.

Key strengths

The small variant (24.8 million parameters) achieves competitive or superior zero-shot performance compared to MiDaS v3.1 BEiT_L-512 (345 million parameters), with 14× fewer parameters. On the KITTI dataset, it reports an AbsRel of 0.080 and δ1 of 0.936; on NYUv2, AbsRel 0.053 and δ1 0.972. Additional zero-shot results on Sintel, DDAD, ETH3D, and DIODE are listed below.

Dataset	AbsRel	δ1
KITTI	0.080	0.936
NYUv2	0.053	0.972
Sintel	0.464	0.739
DDAD	0.247	0.768
ETH3D	0.127	0.885
DIODE	0.076	0.939

When fine-tuned for metric depth on NYUv2 and KITTI, the model sets new state-of-the-art results. The encoder also transfers well to semantic segmentation, achieving 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K.

Depth Anything overview diagram from the original paper

Introduced in the CVPR 2024 paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, the model is hosted on gigarouter as a managed OpenAI-compatible API, eliminating the need for local installation or hardware configuration.

best for

·Zero-shot relative depth estimation for any image
·Providing depth conditioning for ControlNet pipelines
·Fine-tuning for metric depth estimation on NYUv2 or KITTI

FAQ

What input format does the model expect?

It accepts a single RGB image (e.g., JPEG/PNG) and returns a depth map as a grayscale image or tensor.

How does this model compare to MiDaS in size and performance?

Depth Anything Small has 24.8M parameters (14x fewer than MiDaS v3.1 BEiT_L-512) yet outperforms MiDaS on most zero-shot benchmarks.

What is the license for Depth Anything Small?

The model is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; send an image URL or base64-encoded image and receive the depth map in response.

What training data was used for this model?

The model was trained on 1.5M labeled images and 62M+ unlabeled images, significantly expanding data coverage.

not yet live

We're benchmarking and onboarding Depth Anything Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo