Linq-Embed-Mistral
Linq-AI-Research/Linq-Embed-Mistral
published May 2024 · updated Jun 2024
Linq-Embed-Mistral is a text embedding model built on Mistral-7B, fine-tuned for high-performance retrieval, achieving top scores on the MTEB benchmark.
specs
| Task | Text embedding & retrieval |
| Architecture | Mistral-7B transformer decoder with grouped-query attention and sliding window attention |
| Parameters | 7 billion |
about this model
Linq-Embed-Mistral is an embedding model that generates high-quality text representations optimized for retrieval tasks, built upon the E5-mistral-7b-instruct and Mistral-7B-v0.1 foundations. The model improves text retrieval through advanced data refinement methods, including sophisticated data crafting, data filtering, and negative mining guided by teacher models, applied to both existing benchmark datasets and highly tailored synthetic datasets generated via LLMs. These techniques produce high-quality triplet datasets (query, positive example, negative example) that significantly enhance retrieval performance.
Benchmark Performance
On the Massive Text Embedding Benchmark (MTEB) as of May 29, 2024, Linq-Embed-Mistral achieves a retrieval score of 60.2, ranking first among all models on the MTEB leaderboard for the retrieval task. The model attains an average score of 68.2 across 56 datasets, making it the highest-ranking publicly accessible model and third overall. (NV-Embed-v1 and voyage-large-2-instruct, ranked 1st and 2nd overall, reported their performance without releasing their models.)
| Model | Retrieval (15 datasets) | Average (56 datasets) |
|---|---|---|
| Linq-Embed-Mistral | 60.2 | 68.2 |
| NV-Embed-v1 | 59.4 | 69.3 |
| SFR-Embedding-Mistral | 59.0 | 67.6 |
| voyage-large-2-instruct | 58.3 | 68.3 |
| GritLM-7B | 57.4 | 66.8 |
| e5-mistral-7b-instruct | 56.9 | 66.6 |
| text-embedding-3-large | 55.4 | 64.6 |
The model uses a 7-billion-parameter architecture with grouped-query attention and sliding window attention, supporting a maximum sequence length of 4096 tokens. It requires a task-specific instruction prepended to each query at inference time. For detailed per-task MTEB results, refer to the model page on Hugging Face.
best for
- ·Improving search precision in retrieval-augmented generation (RAG) systems
- ·Multilingual text retrieval across 100+ languages
- ·Enterprise semantic search and document ranking
FAQ
It excels at text retrieval tasks, outperforming most models on MTEB retrieval and averaging 68.2 across 56 datasets.
Linq-Embed-Mistral achieves higher MTEB retrieval (60.2 vs 55.4) and average (68.2 vs 64.6) scores.
Queries need a one-sentence task instruction prefixed with "Instruct: {task}\nQuery: ". Passages do not require an instruction.
Use the OpenAI-compatible endpoint with your API key; send a POST request with input texts following the required instruction format.
The model supports up to 4096 tokens per input.
We're benchmarking and onboarding Linq-Embed-Mistral as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.