skip to content
gigarouter gigarouter

nomic embed vision v1.5

nomic-ai/nomic-embed-vision-v1.5

published Jun 2024 · updated Mar 2025

A popular open image feature extraction model, with 1.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
1.3M
license
apache-2.0

about this model

nomic-embed-vision-v1.5 is an image-feature-extraction model that produces high-quality vision embeddings aligned with the same latent space as nomic-embed-text-v1.5, enabling unified multimodal retrieval and cross-modal search.

Key Strengths

  • Shares a common embedding space with nomic-embed-text-v1.5, making text-to-image and image-to-text retrieval straightforward with proper prefixing (e.g., search_query:).
  • Outperforms OpenAI CLIP ViT B/16 and Jina CLIP v1 on ImageNet zero-shot classification and Datacomp benchmarks.
  • Training uses a locked-text variant of LiT (Locked-image Tuning), aligning the vision encoder to the frozen text embedder.
  • Open-code and open-weights release under a permissive license; technical details available in the Nomic Embed Vision report (arXiv:2406.18587).

Benchmark Performance

ModelImageNet 0-shotDatacomp (Avg. 38)MTEB
nomic-embed-vision-v1.571.056.862.28
nomic-embed-vision-v170.756.762.39
OpenAI CLIP ViT B/1668.356.343.82
Jina CLIP v159.152.260.1

Visualizing the Embedding Space

Click the interactive map below to explore the joint vision-text embedding space on a 100,000-sample subset of CC3M.

Interactive Atlas map showing alignment of vision and text embeddings

not yet live

We're benchmarking and onboarding nomic embed vision v1.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.