GTE Qwen2 7B Instruct

Alibaba-NLP/gte-Qwen2-7B-instruct

published Jun 2024 · updated Mar 2025

GTE Qwen2 7B Instruct is a multilingual text embedding model that ranks No.1 on the MTEB benchmark for both English and Chinese.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

79.3K

license

apache-2.0

specs

Task	Text Embedding
Architecture	Decoder-only (Qwen2-7B)
Parameters	7B
Embedding Dimension	3584
Max Input Tokens	32k

about this model

gte-Qwen2-7B-instruct is a multilingual text embedding model that produces dense vector representations for retrieval, classification, clustering, and semantic similarity tasks. It is the latest model in the gte (General Text Embedding) family, built on the Qwen2-7B large language model with bidirectional attention and instruction tuning applied only on the query side.

The model ranks No.1 on the Massive Text Embedding Benchmark (MTEB) for both English and Chinese evaluations as of June 16, 2024. It achieves an average score of 70.24 on MTEB (56 English tasks) and 72.05 on C-MTEB (35 Chinese tasks), outperforming all prior models including NV-Embed-v1 (69.32) and gte-Qwen1.5-7B-instruct (67.34). It also scores 68.25 on MTEB-fr (26 French tasks) and 67.86 on MTEB-pl (26 Polish tasks).

Architecture and capabilities

The model is built on Qwen2-7B with bidirectional attention and instruction tuning applied only to queries. It supports a maximum input length of 32,000 tokens and produces 3,584-dimensional embeddings. Training uses multi-stage contrastive learning across a large multilingual corpus combining weakly supervised and supervised data, as described in the GTE paper (arXiv:2308.03281).

Benchmark comparison

Model	MTEB (56)	C-MTEB (35)	MTEB-fr (26)	MTEB-pl (26)
gte-Qwen2-7B-instruct	70.24	72.05	68.25	67.86
NV-Embed-v1	69.32	—	—	—
gte-Qwen1.5-7B-instruct	67.34	69.52	—	—
e5-mistral-7b-instruct	66.63	60.81	—	—

The model supports a 32,000-token context window and is trained with multi-stage contrastive learning across diverse domains and languages. It uses bidirectional attention and instruction tuning applied only to queries, enabling strong performance on retrieval, classification, clustering, and semantic similarity tasks without per-language fine-tuning.

best for

·Retrieving relevant documents for web search queries
·Multilingual text similarity and clustering
·Semantic search across diverse domains

FAQ

What is the maximum input length supported?

32,000 tokens.

How does this model compare to gte-Qwen1.5-7B-instruct?

It uses the same training data and strategy but upgrades the base model to Qwen2-7B, leading to improved performance on MTEB.

What is the output embedding dimension?

3584 dimensions.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending prompts as described in the usage examples.

What is the purpose of instruction tuning in this model?

Instruction tuning is applied only on the query side for streamlined efficiency.

not yet live

We're benchmarking and onboarding GTE Qwen2 7B Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5