Depth Anything Small

Xenova/depth-anything-small-hf

published Jan 2024 · updated Jun 2025

Depth Anything Small is a depth estimation model that predicts a depth map from a single image using a DPT architecture with a DINOv2 backbone.

status

coming soon

API providers

downloads / mo

5.8K

specs

Task	Depth Estimation
Architecture	DPT with DINOv2 backbone
Parameters	24.8M
License	Apache-2.0

about this model

Xenova/depth-anything-small-hf is a monocular depth estimation model that predicts a dense depth map from a single input image. It is the ONNX-exported version of the Depth Anything small model, designed for efficient inference in web and edge environments via Transformers.js. The model uses the DPT architecture with a DINOv2 backbone and contains 24.8 million parameters. It was trained on approximately 62 million images, including 1.5 million labeled and 62 million unlabeled images, as described in the paper "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data" (arXiv:2401.10891), accepted at CVPR 2024. In zero-shot evaluations across six public datasets, the model demonstrates strong generalization to diverse scenes and image types. After fine-tuning with metric depth data from NYUv2 and KITTI, the Depth Anything family achieves state-of-the-art results.

Key capabilities

Produces a per-pixel depth map from a single RGB image
Supports zero-shot inference on unseen domains and randomly captured photos
Optimized for client-side inference with ONNX weights

Output example

The model outputs both a raw depth tensor and a visualizable depth image. For example, given an input image of bread, the model generates a depth map as shown below:

Depth map output from the model applied to an image of bread

The model is licensed under Apache-2.0 and is part of the Depth Anything release collection on Hugging Face, with over 70 community Spaces using the original PyTorch version.

best for

·Monocular depth estimation in web or Node.js applications
·Zero-shot depth prediction on diverse, unlabeled images

FAQ

What is the input format for the depth estimation API?

The API accepts an image URL or base64-encoded image data; the model outputs a depth map as a tensor and a visual depth image.

How does Depth Anything Small compare to larger variants?

Depth Anything Small has 24.8M parameters, making it faster and more lightweight than the base or large variants, while still offering strong zero-shot performance.

What license is this model released under?

The model is licensed under Apache-2.0.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a request with the image input and specifying the model name.

What datasets was Depth Anything Small trained on?

The model was trained on approximately 62 million images, including about 1.5 million labeled and 62 million unlabeled images.

not yet live

We're benchmarking and onboarding Depth Anything Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo