GTE Qwen2 7B Instruct
Alibaba-NLP/gte-Qwen2-7B-instruct
published Jun 2024 · updated Mar 2025
GTE Qwen2 7B Instruct is a multilingual text embedding model that ranks No.1 on the MTEB benchmark for both English and Chinese.
specs
| Task | Text Embedding |
| Architecture | Decoder-only (Qwen2-7B) |
| Parameters | 7B |
| Embedding Dimension | 3584 |
| Max Input Tokens | 32k |
about this model
gte-Qwen2-7B-instruct is a multilingual text embedding model that produces dense vector representations for retrieval, classification, clustering, and semantic similarity tasks. It is the latest model in the gte (General Text Embedding) family, built on the Qwen2-7B large language model with bidirectional attention and instruction tuning applied only on the query side.
The model ranks No.1 on the Massive Text Embedding Benchmark (MTEB) for both English and Chinese evaluations as of June 16, 2024. It achieves an average score of 70.24 on MTEB (56 English tasks) and 72.05 on C-MTEB (35 Chinese tasks), outperforming all prior models including NV-Embed-v1 (69.32) and gte-Qwen1.5-7B-instruct (67.34). It also scores 68.25 on MTEB-fr (26 French tasks) and 67.86 on MTEB-pl (26 Polish tasks).
Architecture and capabilities
The model is built on Qwen2-7B with bidirectional attention and instruction tuning applied only to queries. It supports a maximum input length of 32,000 tokens and produces 3,584-dimensional embeddings. Training uses multi-stage contrastive learning across a large multilingual corpus combining weakly supervised and supervised data, as described in the GTE paper (arXiv:2308.03281).
Benchmark comparison
| Model | MTEB (56) | C-MTEB (35) | MTEB-fr (26) | MTEB-pl (26) |
|---|---|---|---|---|
| gte-Qwen2-7B-instruct | 70.24 | 72.05 | 68.25 | 67.86 |
| NV-Embed-v1 | 69.32 | — | — | — |
| gte-Qwen1.5-7B-instruct | 67.34 | 69.52 | — | — |
| e5-mistral-7b-instruct | 66.63 | 60.81 | — | — |
The model supports a 32,000-token context window and is trained with multi-stage contrastive learning across diverse domains and languages. It uses bidirectional attention and instruction tuning applied only to queries, enabling strong performance on retrieval, classification, clustering, and semantic similarity tasks without per-language fine-tuning.
best for
- ·Retrieving relevant documents for web search queries
- ·Multilingual text similarity and clustering
- ·Semantic search across diverse domains
FAQ
32,000 tokens.
It uses the same training data and strategy but upgrades the base model to Qwen2-7B, leading to improved performance on MTEB.
3584 dimensions.
Use the OpenAI-compatible endpoint with your API key, sending prompts as described in the usage examples.
Instruction tuning is applied only on the query side for streamlined efficiency.
We're benchmarking and onboarding GTE Qwen2 7B Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.