models / image feature extraction · coming soon

nomic embed vision v1.5

nomic-ai/nomic-embed-vision-v1.5

published Jun 2024 · updated Mar 2025

A popular open image feature extraction model, with 1.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

1.3M

license

apache-2.0

about this model

nomic-embed-vision-v1.5 is an image-feature-extraction model that produces high-quality vision embeddings aligned with the same latent space as nomic-embed-text-v1.5, enabling unified multimodal retrieval and cross-modal search.

Key Strengths

Shares a common embedding space with nomic-embed-text-v1.5, making text-to-image and image-to-text retrieval straightforward with proper prefixing (e.g., search_query:).
Outperforms OpenAI CLIP ViT B/16 and Jina CLIP v1 on ImageNet zero-shot classification and Datacomp benchmarks.
Training uses a locked-text variant of LiT (Locked-image Tuning), aligning the vision encoder to the frozen text embedder.
Open-code and open-weights release under a permissive license; technical details available in the Nomic Embed Vision report (arXiv:2406.18587).

Benchmark Performance

Model	ImageNet 0-shot	Datacomp (Avg. 38)	MTEB
nomic-embed-vision-v1.5	71.0	56.8	62.28
nomic-embed-vision-v1	70.7	56.7	62.39
OpenAI CLIP ViT B/16	68.3	56.3	43.82
Jina CLIP v1	59.1	52.2	60.1

Visualizing the Embedding Space

Click the interactive map below to explore the joint vision-text embedding space on a 100,000-sample subset of CC3M.

not yet live

We're benchmarking and onboarding nomic embed vision v1.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.