Qwen3 Embedding 4B
Qwen/Qwen3-Embedding-4B
published Jun 2025 · updated Jun 2025
Qwen3 Embedding 4B is a text embedding model that supports over 100 languages, a context length of 32k, and flexible output dimensions up to 2560, built upon the Qwen3 dense foundation model.
specs
| Task | Text Embedding |
| Architecture | Qwen3 dense transformer |
| Parameters | 4B |
| Context Length | 32K |
| Embedding Dimension | Up to 2560 (MRL support, user-defined from 32 to 2560) |
| License | Apache 2.0 |
about this model
Qwen3-Embedding-4B is a text embedding model that converts text into dense vector representations, designed for retrieval, classification, clustering, and bitext mining tasks. Built on the Qwen3 foundation, it processes up to 32,000 tokens and supports user-defined output dimensions from 32 to 2560 via Matryoshka Representation Learning (MRL), allowing flexible trade-offs between vector size and performance.
The model is instruction-aware: developers can prepend task-specific instructions to queries (e.g., "Given a web search query, retrieve relevant passages")—a practice that typically improves downstream results by 1–5%. It covers over 100 languages, including programming languages, enabling robust multilingual, cross-lingual, and code retrieval.
Training and Performance
A multi-stage pipeline combining unsupervised pre-training with supervised fine-tuning on high-quality data—partly synthesized by the Qwen3 LLM backbone—produces state-of-the-art results. The larger 8B variant ranks No.1 on the MTEB multilingual leaderboard (score 70.58, as of June 5, 2025). The 4B model inherits the same architecture and training methodology, delivering strong performance across text and code retrieval, classification, clustering, and STS tasks.
Released under the Apache 2.0 license, the model is part of a series that includes 0.6B and 8B embedding variants, as well as dedicated reranker models (0.6B, 4B, 8B) for scoring and ranking.
best for
- ·Multilingual document retrieval
- ·Code search across programming languages
- ·Semantic clustering of text data
- ·Cross-lingual sentence similarity
FAQ
The model supports a context length of 32K tokens.
Yes, it supports Matryoshka Representation Learning (MRL), allowing user-defined dimensions from 32 to 2560.
The model is released under the Apache 2.0 license.
Send requests to the OpenAI-compatible endpoint with your API key, specifying the model name and input text.
The 4B model offers a balanced trade-off between performance and efficiency, requiring less memory and providing faster inference than the 8B model while still achieving strong results across embedding tasks.
# OpenAI client - just change base_url from openai import OpenAI client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY) v = client.embeddings.create(model="Qwen/Qwen3-Embedding-4B", input=["hello world"]) print(v.data[0].embedding[:4])
try it live
runs the real hosted model on a shared demo allowance · get your own key + $25 free →