nomic embed text v1 unsupervised
nomic-ai/nomic-embed-text-v1-unsupervised
published Jan 2024 · updated Aug 2024
A popular open embeddings model, with 609 downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
nomic-embed-text-v1-unsupervised is a text embedding model that produces 8192-token-context dense vectors. It is the intermediate checkpoint from the multi-stage training of the nomic-embed-text-v1 model, taken after weakly-supervised contrastive pretraining and before supervised finetuning. This checkpoint is released as a reproducible training artifact under the Apache 2.0 license.
Key Strengths
- Long-context support: full 8192 token input length.
- Fully reproducible: training code, curated data, and model weights are available at github.com/nomic-ai/contrastors.
- 137 million parameters, efficient for a context length of this size.
- Task instruction prefixes are mandatory (e.g.,
search_document,search_query,clustering,classification).
Benchmarks (Final Supervised Model)
The final supervised model, nomic-embed-text-v1, achieves the following scores. The unsupervised checkpoint shares the same architecture and context length but may yield different results.
| Benchmark | nomic-embed-text-v1 (final) | text-embedding-ada-002 | text-embedding-3-small |
|---|---|---|---|
| MTEB (short-context) | 62.39 | 60.99 | 62.26 |
| LoCo (long-context) | 85.53 | 52.70 | 82.40 |
| Jina Long Context | 54.16 | 55.25 | 58.20 |
Usage Notes
This model is hosted by gigarouter as a managed, OpenAI-compatible API. No local installation or model loading is required. All facts are drawn from the model card, the Nomic Embed technical report (arXiv:2402.01613), and verified benchmarks.
We're benchmarking and onboarding nomic embed text v1 unsupervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.