skip to content
gigarouter gigarouter
models / depth estimation · coming soon

Depth Anything Small

LiheYoung/depth_anything_vits14

published Jan 2024 · updated Jan 2024

Depth Anything Small is a monocular depth estimation model that predicts a depth map from a single image, trained on 1.5M labeled and 62M+ unlabeled images for robust zero-shot generalization.

status
coming soon
API providers
0
downloads / mo
3.9K

specs

TaskMonocular Depth Estimation
ArchitectureVision Transformer (ViT-S) with DPT head
Parameters24.8M
LicenseApache 2.0

about this model

Depth Anything (small variant, vits14) is a monocular depth estimation model that delivers robust, generalizable depth maps from single RGB images. It was trained on a combination of 1.5 million labeled images and over 62 million unlabeled images using a data engine that scales up data coverage and reduces generalization error. The model adopts a ViT-small backbone (24.8 million parameters) and is accepted at CVPR 2024.

Key strengths

  • Strong zero-shot performance across diverse environments, including indoor, outdoor, and synthetic scenes.
  • Despite being much lighter (24.8M params), it outperforms MiDaS v3.1 BEiT L-512 (345M params) on KITTI, Sintel, and ETH3D, and is competitive on NYUv2, DDAD, and DIODE.
  • The encoder also serves as a powerful feature extractor for downstream tasks: fine-tuned on semantic segmentation, it achieves 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K.

Zero-shot benchmark results (Ours-S / vits14)

Dataset AbsRel δ₁
KITTI0.0800.936
NYUv20.0530.972
Sintel0.4640.739
DDAD0.2470.768
ETH3D0.1270.885
DIODE0.0760.939

The model is widely adopted as the default depth processor in tools such as InstantID, InvokeAI, and ControlNet-based workflows. It is also available in ONNX and TensorRT formats for deployment flexibility.

Through gigarouter’s hosted API, you can call Depth Anything directly without managing dependencies or infrastructure, receiving OpenAI-compatible depth outputs for any input image.

best for

FAQ

What is the input and output format for the Depth Anything Small API?

The API accepts an image file (e.g., PNG, JPEG) and returns a depth map as a grayscale image or raw tensor, depending on the endpoint configuration.

How does Depth Anything Small compare to larger depth models in speed and size?

At 24.8M parameters, it is significantly smaller and faster than models like MiDaS v3.1 BEiT L-512 (345M), while often matching or exceeding its zero-shot accuracy on benchmarks like KITTI and NYUv2.

What is the license for using Depth Anything Small?

The model is released under the Apache 2.0 license, allowing for commercial and non-commercial use with attribution.

How can I call the Depth Anything Small model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a POST request with the image data to the designated depth estimation route.

What training data was used for Depth Anything Small?

The model was trained on a combination of 1.5M labeled images and over 62 million unlabeled images, using a data engine to scale up data coverage.

not yet live

We're benchmarking and onboarding Depth Anything Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →