Nomic Embed Text V1
nomic-ai/nomic-embed-text-v1
published Jan 2024 · updated Apr 2026
Nomic Embed Text V1 is a text embedding model with 8192 token context length that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on short and long context tasks.
specs
| Task | Text Embeddings |
| Architecture | BERT-based |
| Context Length | 8192 tokens |
| License | Apache 2.0 |
about this model
nomic-embed-text-v1 is a text embedding model that encodes English texts into dense vector representations with a context length of 8,192 tokens. It is a fully open and reproducible model, released under the Apache 2.0 license, with public model weights, training code, training data, and data curation scripts.
The model outperforms OpenAI text-embedding-ada-002 and text-embedding-3-small on both short-context and long-context benchmarks. Its multi-stage training pipeline is built on Flash Attention and supports Matryoshka Representation Learning for flexible embedding sizes. An aligned vision embedding model (nomic-embed-vision-v1) enables multimodal retrieval: any text embedding can be used with vision embeddings in the same latent space. The paper has been accepted to TMLR (Transactions on Machine Learning Research).
Benchmark Results
| Name | SeqLen | MTEB | LoCo |
|---|
best for
- ·Retrieval-augmented generation (RAG) with long documents
- ·Semantic clustering and topic discovery
- ·Text classification feature extraction
FAQ
Use <code>search_query:</code> for queries and <code>search_document:</code> for documents.
The model card does not specify the dimension; it is a BERT-based model with mean pooling.
Yes, fully reproducible with open weights, open training code, and open data under Apache 2.0.
Use the gigarouter OpenAI-compatible endpoint with your API key and include the appropriate task prefix.
We're benchmarking and onboarding Nomic Embed Text V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.