nomic embed vision v1.5
nomic-ai/nomic-embed-vision-v1.5
published Jun 2024 · updated Mar 2025
A popular open image feature extraction model, with 1.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
status
coming soon
API providers
0
downloads / mo
1.3M
license
apache-2.0
about this model
nomic-embed-vision-v1.5 is an image-feature-extraction model that produces high-quality vision embeddings aligned with the same latent space as nomic-embed-text-v1.5, enabling unified multimodal retrieval and cross-modal search.
Key Strengths
- Shares a common embedding space with nomic-embed-text-v1.5, making text-to-image and image-to-text retrieval straightforward with proper prefixing (e.g.,
search_query:). - Outperforms OpenAI CLIP ViT B/16 and Jina CLIP v1 on ImageNet zero-shot classification and Datacomp benchmarks.
- Training uses a locked-text variant of LiT (Locked-image Tuning), aligning the vision encoder to the frozen text embedder.
- Open-code and open-weights release under a permissive license; technical details available in the Nomic Embed Vision report (arXiv:2406.18587).
Benchmark Performance
| Model | ImageNet 0-shot | Datacomp (Avg. 38) | MTEB |
|---|---|---|---|
| nomic-embed-vision-v1.5 | 71.0 | 56.8 | 62.28 |
| nomic-embed-vision-v1 | 70.7 | 56.7 | 62.39 |
| OpenAI CLIP ViT B/16 | 68.3 | 56.3 | 43.82 |
| Jina CLIP v1 | 59.1 | 52.2 | 60.1 |
Visualizing the Embedding Space
Click the interactive map below to explore the joint vision-text embedding space on a 100,000-sample subset of CC3M.
not yet live
We're benchmarking and onboarding nomic embed vision v1.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.
