Arctic Embed M V2.0
Snowflake/snowflake-arctic-embed-m-v2.0
published Nov 2024 · updated Apr 2025
Arctic Embed M V2.0 is a multilingual embedding model that delivers high-quality retrieval across English and other languages without compromise, supporting long context and efficient compression.
specs
| Task | Text embedding and retrieval |
| Architecture | Transformer (based on GTE-multilingual-base) |
| Parameters | 305M total (113M non-embedding) |
| Dimensions | 768 (reducible via MRL to 256) |
| License | Apache 2.0 |
about this model
Snowflake/snowflake-arctic-embed-m-v2.0 is a multilingual text embedding model optimized for retrieval, supporting Matryoshka Representation Learning (MRL) and a context window of up to 8192 tokens via RoPE. It is designed for enterprise-grade multilingual search and retrieval at scale, delivering competitive performance on both English and non-English benchmarks without compromising on either.
Key Strengths
- Multilingual without compromise: Excels across English and non-English retrieval, outperforming leading open-source and proprietary models on MTEB Retrieval, CLEF, and MIRACL benchmarks.
- Inference efficiency: With 113 million non-embedding parameters, the model is fast and efficient at any scale.
- Compression-friendly: Supports Matryoshka Representation Learning (MRL) to reduce vector dimensions from 768 to 256 with minimal quality degradation, and 4-bit quantization for high-quality retrieval at 128 bytes per vector (e.g., using a
pq256x4fsfast-scan FAISS index). - Long context support: Built on GTE-multilingual-base, supporting up to 8192 tokens via RoPE.
Benchmark Performance
Average NDCG@10 across key retrieval benchmarks:
| Benchmark | Score |
|---|---|
| BEIR (15 datasets) | 55.4 |
| MIRACL (4 languages) | 55.2 |
| CLEF (Focused) | 51.7 |
| CLEF (Full) | 53.9 |
These results place arctic-embed-m-v2.0 ahead of comparable models such as bge-m3, me5-base, and gte-multilingual-base on BEIR and CLEF, while remaining competitive on MIRACL.
Compression Efficiency
Using MRL to truncate embeddings from 768 to 256 dimensions reduces vector size by 3x with approximately 2-3% degradation in NDCG@10. Combining MRL with 4-bit quantization enables high-quality retrieval at 128 bytes per vector.
| Dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
|---|---|---|---|---|
| 768 | 55.4 | 55.2 | 51.7 | 53.9 |
| 256 | 54.4 | 54.0 | 50.6 | 52.3 |
Relative performance drop at 256 dimensions is approximately 2-3% across benchmarks.
Comparison with Alternatives
| Model | Non-emb params | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
|---|---|---|---|---|---|
| snowflake-arctic-m-v2.0 | 113M | 55.4 | 55.2 | 51.7 | 53.9 |
| snowflake-arctic-m | 86M | 54.9 | 24.9 | 34.4 | 29.1 |
| me5 base | 303M | 51.4 | 54.0 | 43.0 | 34.6 |
| bge-m3 (BAAI) | 303M | 48.8 | 56.8 | 40.8 | 41.3 |
| gte (Alibaba) | 113M | 51.1 | 52.3 | 47.7 | 53.1 |
Released under the Apache 2.0 license. A technical report detailing the training methodology is available at arXiv:2412.04506.
best for
- ·Multilingual document search and retrieval across languages
- ·Efficient embedding storage using Matryoshka Representation Learning and quantization
- ·Long-context retrieval tasks (up to 8192 tokens)
FAQ
It is designed for high-quality multilingual text retrieval, excelling on benchmarks like MTEB Retrieval, MIRACL, and CLEF while maintaining strong English performance.
With 113M non-embedding parameters, it is faster and more efficient than larger models like BGE-M3 or me5 base (both ~303M non-emb), and supports a context window of 8192 tokens.
Apache 2.0, permitting free commercial use without restrictions.
Input: text strings. Output: 768-dimensional embeddings (can be reduced to 256 via MRL). For queries, prepend "query: "; for documents, no prefix.
Use the OpenAI-compatible endpoint with your API key and the model ID "snowflake-arctic-embed-m-v2.0". Send a POST request with the input text.
We're benchmarking and onboarding Arctic Embed M V2.0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.