skip to content
gigarouter gigarouter

Qwen3 Embedding 4B

Qwen/Qwen3-Embedding-4B

published Jun 2025 · updated Jun 2025

Qwen3 Embedding 4B is a text embedding model that supports over 100 languages, a context length of 32k, and flexible output dimensions up to 2560, built upon the Qwen3 dense foundation model.

price
$0.008
/ 1M tokens
API providers
0
downloads / mo
2.6M
license
apache-2.0

specs

TaskText Embedding
ArchitectureQwen3 dense transformer
Parameters4B
Context Length32K
Embedding DimensionUp to 2560 (MRL support, user-defined from 32 to 2560)
LicenseApache 2.0

about this model

Qwen3-Embedding-4B is a text embedding model that converts text into dense vector representations, designed for retrieval, classification, clustering, and bitext mining tasks. Built on the Qwen3 foundation, it processes up to 32,000 tokens and supports user-defined output dimensions from 32 to 2560 via Matryoshka Representation Learning (MRL), allowing flexible trade-offs between vector size and performance.

The model is instruction-aware: developers can prepend task-specific instructions to queries (e.g., "Given a web search query, retrieve relevant passages")—a practice that typically improves downstream results by 1–5%. It covers over 100 languages, including programming languages, enabling robust multilingual, cross-lingual, and code retrieval.

Training and Performance

A multi-stage pipeline combining unsupervised pre-training with supervised fine-tuning on high-quality data—partly synthesized by the Qwen3 LLM backbone—produces state-of-the-art results. The larger 8B variant ranks No.1 on the MTEB multilingual leaderboard (score 70.58, as of June 5, 2025). The 4B model inherits the same architecture and training methodology, delivering strong performance across text and code retrieval, classification, clustering, and STS tasks.

Qwen3 Embedding series logo

Released under the Apache 2.0 license, the model is part of a series that includes 0.6B and 8B embedding variants, as well as dedicated reranker models (0.6B, 4B, 8B) for scoring and ranking.

best for

FAQ

What is the maximum context length for Qwen3 Embedding 4B?

The model supports a context length of 32K tokens.

Does the model support custom embedding dimensions?

Yes, it supports Matryoshka Representation Learning (MRL), allowing user-defined dimensions from 32 to 2560.

What is the license for Qwen3 Embedding 4B?

The model is released under the Apache 2.0 license.

How can I use this model via the gigarouter API?

Send requests to the OpenAI-compatible endpoint with your API key, specifying the model name and input text.

How does the 4B model compare to the 8B version?

The 4B model offers a balanced trade-off between performance and efficiency, requiring less memory and providing faster inference than the 8B model while still achieving strong results across embedding tasks.

call it
# OpenAI client - just change base_url
from openai import OpenAI
client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY)
v = client.embeddings.create(model="Qwen/Qwen3-Embedding-4B", input=["hello world"])
print(v.data[0].embedding[:4])

try it live

runs the real hosted model on a shared demo allowance · get your own key + $25 free →

related embeddings models

compare all →