Depth Anything Base

LiheYoung/depth-anything-base-hf

published Jan 2024 · updated Jan 2024

Depth Anything Base is a depth model that performs robust monocular depth estimation on any image using a DPT architecture with a DINOv2 backbone, trained on over 62 million images.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

47.3K

license

apache-2.0

specs

Task	Monocular Depth Estimation
Architecture	DPT with DINOv2 backbone
Training Data	62M+ unlabeled images + 1.5M labeled images
License	Apache 2.0

about this model

LiheYoung/depth-anything-base-hf is a monocular depth estimation model that combines a DINOv2 backbone with the DPT architecture to produce high-quality depth maps from a single RGB image.

Architecture and training

The model leverages a Vision Transformer (DINOv2) encoder and a DPT decoder, a design that yields up to 28% relative improvement over fully convolutional networks for depth estimation. It was trained on a combination of 1.5 million labeled images and over 62 million unlabeled images using a teacher-student pipeline, where the teacher generates pseudo-labels and the student is trained with strong augmentations to improve robustness.

Performance

Depth Anything achieves state-of-the-art results on both relative and absolute depth estimation benchmarks, demonstrating strong generalization across diverse scenes and lighting conditions. The paper was accepted at CVPR 2024.

Overview diagram of the Depth Anything model showing the teacher-student training pipeline and the use of large-scale unlabeled data.

This model is hosted as a managed, OpenAI-compatible API on gigarouter, allowing developers to integrate monocular depth estimation into their applications without managing infrastructure.

best for

·Zero-shot depth estimation on arbitrary images
·Depth condition input for ControlNet in image generation
·Video depth estimation and visualization

FAQ

What is the model architecture?

It uses a DPT decoder with a DINOv2 image encoder backbone.

What data was it trained on?

It was trained on 1.5 million labeled images and over 62 million unlabeled images.

What is the license?

The model is released under the Apache 2.0 license.

How can I use this model via the gigarouter API?

Send an image URL to the OpenAI-compatible endpoint with your API key to receive a depth map.

Does it support video depth estimation?

Yes, the official repository provides a script for video depth visualization.

not yet live

We're benchmarking and onboarding Depth Anything Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related depth estimation models

compare all →

Depth-Anything-V2-Small-hf

1.7M dl/mo

DA3METRIC-LARGE

825K dl/mo

depth-anything-large-hf

388.9K dl/mo

dpt-hybrid-midas

225.1K dl/mo

DA3NESTED-GIANT-LARGE-1.1

199.9K dl/mo

Depth-Anything-V2-Large-hf

199.1K dl/mo