Nomic Embed Text V1 Ablated
nomic-ai/nomic-embed-text-v1-ablated
published Jan 2024 · updated Aug 2024
Nomic Embed Text V1 Ablated is an 8192 context length text embedding model trained on a modified dataset to study the impact of training data on model outcomes.
specs
| Task | Text Embedding |
| Context Length | 8192 tokens |
| License | Apache 2.0 |
about this model
nomic-embed-text-v1-ablated is an 8192 context length English text encoder designed for reproducibility research. This checkpoint was trained with a modified dataset to enable analysis of data subsets on model outcomes. It is released under the Apache 2.0 license as part of the Nomic Embed project, with full training code and curated data available for replication (see arXiv:2402.01613).
As a variant of the nomic-embed-text-v1 family, this model is suited for studying the impact of training data composition on embedding quality rather than for production embedding extraction. For that purpose, the final nomic-embed-text-v1 model is recommended by the project authors.
Key attributes:
- Context length: 8192 tokens.
- Open-source: weights, training code, and data are publicly available under Apache 2.0.
- Purpose: facilitates transparency and reproducibility in long-context text embedding research.
No benchmark scores are reported for this ablated checkpoint; its primary value lies in enabling controlled comparisons of training data effects.
best for
- ·Reproducing ablation studies from the Nomic Embed tech report
- ·Analyzing the effect of training data subsets on embedding quality
FAQ
8192 tokens.
Apache 2.0.
It is trained on a modified training dataset to understand the impact of data on model outcomes.
No; the model card recommends using nomic-embed-text-v1 for extracting embeddings in production.
Use the gigarouter OpenAI-compatible endpoint with an API key.
We're benchmarking and onboarding Nomic Embed Text V1 Ablated as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.