UniDepth V2 ViT-L14
lpiccinelli/unidepth-v2-vitl14
published Jun 2024 · updated Feb 2025
UniDepth V2 ViT-L14 is a monocular metric depth estimation model that predicts accurate metric depth from a single RGB image, achieving state-of-the-art results on KITTI and NYU Depth v2 benchmarks.
specs
| Task | Monocular Metric Depth Estimation |
| Architecture | ViT-L14 backbone with self-promptable camera module and pseudo-spherical output representation |
| Input | Single RGB image (any camera) |
| Output | Metric depth map (meters) and uncertainty map |
about this model
lpiccinelli/unidepth-v2-vitl14 is a universal monocular metric depth estimation model that predicts metric depth from a single image without requiring camera intrinsics. Built on a ViT-L/14 backbone, it uses a self-promptable camera module to predict a dense camera representation and a pseudo-spherical output (azimuth, elevation, depth) that disentangles camera parameters from depth, enabling zero-shot generalization across diverse datasets and camera geometries.
UniDepthV2 improves over the CVPR 2024 Highlight UniDepthV1 by adding an uncertainty-level output for downstream tasks needing confidence estimates and an edge-guided loss that sharpens depth boundaries. The model is state-of-the-art on NYU Depth v2 and KITTI Eigen for monocular depth estimation and was ranked 1st on the KITTI depth prediction benchmark at the time of submission. On the provided test assets, its demo achieves an Absolute Relative error (ARel) of 7.45%.
This model is hosted as a managed, OpenAI-compatible API on gigarouter, requiring no local installation or GPU infrastructure.
best for
- ·Autonomous vehicle depth perception
- ·3D scene reconstruction from a single image
- ·Augmented reality depth-aware effects
- ·Robotic manipulation and navigation
FAQ
It is best for monocular metric depth estimation, providing accurate depth in meters from a single image with state-of-the-art performance on KITTI and NYU Depth v2.
It accepts a single RGB image. The self-promptable camera module handles any camera geometry automatically.
It outputs a metric depth map in meters and an optional uncertainty map for downstream tasks requiring confidence.
Send an image to the gigarouter OpenAI-compatible endpoint with your API key to receive depth predictions.
It uses a self-promptable camera module that predicts dense camera representations, enabling universal metric depth estimation without camera calibration.
We're benchmarking and onboarding UniDepth V2 ViT-L14 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.