E5 Mistral 7B Instruct
intfloat/e5-mistral-7b-instruct
published Dec 2023 · updated Apr 2026
E5 Mistral 7B Instruct is an embedding model that generates high-quality text embeddings using instruction-based queries, fine-tuned from Mistral-7B.
specs
| Task | Text Embeddings |
| Architecture | Decoder-only LLM (Mistral-7B-v0.1) |
| Parameters | 7B |
| Embedding Size | 4096 |
| Max Sequence Length | 4096 tokens |
| BEIR Score | 56.9 |
about this model
intfloat/e5-mistral-7b-instruct is an embedding model that generates high-quality text embeddings using a decoder-only LLM architecture, fine-tuned with synthetic data and standard contrastive loss. It produces 4096-dimensional embeddings from 32 layers, initialized from Mistral-7B-v0.1.
Key Capabilities
The model accepts instructions prepended to queries to customize embeddings for different tasks (e.g., web search, summarization, STS). It supports a maximum input length of 4096 tokens. While it has some multilingual capability due to fine-tuning on multilingual data, it is recommended primarily for English use.
Performance
The model achieves a BEIR score of 56.9. On the MTEB benchmark, it sets state-of-the-art results when fine-tuned on a mixture of synthetic and labeled data. The paper introducing this model was accepted at ACL 2024.
Training Approach
The model was fine-tuned on synthetic data generated by proprietary LLMs covering 93 languages, using standard contrastive loss with fewer than 1,000 training steps. No labeled data is required for strong performance; when combined with labeled data, it achieves state-of-the-art results on BEIR and MTEB benchmarks.
Architecture
32 layers, embedding size 4096. Initialized from Mistral-7B-v0.1. Inputs longer than 4096 tokens are not recommended.
Supported Languages
Primarily English. Some multilingual capability exists, but for multilingual use cases, multilingual-e5-large is recommended.
best for
- ·Information retrieval and passage ranking with instruction-tuned queries
- ·Semantic textual similarity and clustering
- ·Zero-shot retrieval on BEIR and MTEB benchmarks
FAQ
Yes, the model requires a one-sentence task instruction prepended to the query for best performance; documents do not need instructions.
4096 tokens. Longer inputs are not recommended and may degrade performance.
It has some multilingual capability due to mixed training data, but is primarily optimized for English. For multilingual use, consider multilingual-e5-large.
Use the OpenAI-compatible endpoint with your API key, specifying the model name "e5-mistral-7b-instruct" in the request.
It was fine-tuned using contrastive loss on synthetic data generated by LLMs (no labeled data required), and further improved with labeled data to achieve SOTA on BEIR and MTEB.
We're benchmarking and onboarding E5 Mistral 7B Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.