skip to content
gigarouter gigarouter
models / embeddings · coming soon

nomic embed text v1 unsupervised

nomic-ai/nomic-embed-text-v1-unsupervised

published Jan 2024 · updated Aug 2024

A popular open embeddings model, with 609 downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
609
license
apache-2.0

about this model

nomic-embed-text-v1-unsupervised is a text embedding model that produces 8192-token-context dense vectors. It is the intermediate checkpoint from the multi-stage training of the nomic-embed-text-v1 model, taken after weakly-supervised contrastive pretraining and before supervised finetuning. This checkpoint is released as a reproducible training artifact under the Apache 2.0 license.

Key Strengths

  • Long-context support: full 8192 token input length.
  • Fully reproducible: training code, curated data, and model weights are available at github.com/nomic-ai/contrastors.
  • 137 million parameters, efficient for a context length of this size.
  • Task instruction prefixes are mandatory (e.g., search_document, search_query, clustering, classification).

Benchmarks (Final Supervised Model)

The final supervised model, nomic-embed-text-v1, achieves the following scores. The unsupervised checkpoint shares the same architecture and context length but may yield different results.

Benchmarknomic-embed-text-v1 (final)text-embedding-ada-002text-embedding-3-small
MTEB (short-context)62.3960.9962.26
LoCo (long-context)85.5352.7082.40
Jina Long Context54.1655.2558.20

Usage Notes

This model is hosted by gigarouter as a managed, OpenAI-compatible API. No local installation or model loading is required. All facts are drawn from the model card, the Nomic Embed technical report (arXiv:2402.01613), and verified benchmarks.

not yet live

We're benchmarking and onboarding nomic embed text v1 unsupervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →