Qwen3 Embedding 4B

Qwen/Qwen3-Embedding-4B

published Jun 2025 · updated Jun 2025

Qwen3 Embedding 4B is a text embedding model that supports over 100 languages, a context length of 32k, and flexible output dimensions up to 2560, built upon the Qwen3 dense foundation model.

price

$0.008

/ 1M tokens

API providers

downloads / mo

2.6M

license

apache-2.0

specs

Task	Text Embedding
Architecture	Qwen3 dense transformer
Parameters	4B
Context Length	32K
Embedding Dimension	Up to 2560 (MRL support, user-defined from 32 to 2560)
License	Apache 2.0

about this model

Qwen3-Embedding-4B is a text embedding model that converts text into dense vector representations, designed for retrieval, classification, clustering, and bitext mining tasks. Built on the Qwen3 foundation, it processes up to 32,000 tokens and supports user-defined output dimensions from 32 to 2560 via Matryoshka Representation Learning (MRL), allowing flexible trade-offs between vector size and performance.

The model is instruction-aware: developers can prepend task-specific instructions to queries (e.g., "Given a web search query, retrieve relevant passages")—a practice that typically improves downstream results by 1–5%. It covers over 100 languages, including programming languages, enabling robust multilingual, cross-lingual, and code retrieval.

Training and Performance

A multi-stage pipeline combining unsupervised pre-training with supervised fine-tuning on high-quality data—partly synthesized by the Qwen3 LLM backbone—produces state-of-the-art results. The larger 8B variant ranks No.1 on the MTEB multilingual leaderboard (score 70.58, as of June 5, 2025). The 4B model inherits the same architecture and training methodology, delivering strong performance across text and code retrieval, classification, clustering, and STS tasks.

Released under the Apache 2.0 license, the model is part of a series that includes 0.6B and 8B embedding variants, as well as dedicated reranker models (0.6B, 4B, 8B) for scoring and ranking.

best for

·Multilingual document retrieval
·Code search across programming languages
·Semantic clustering of text data
·Cross-lingual sentence similarity

FAQ

What is the maximum context length for Qwen3 Embedding 4B?

The model supports a context length of 32K tokens.

Does the model support custom embedding dimensions?

Yes, it supports Matryoshka Representation Learning (MRL), allowing user-defined dimensions from 32 to 2560.

What is the license for Qwen3 Embedding 4B?

The model is released under the Apache 2.0 license.

How can I use this model via the gigarouter API?

Send requests to the OpenAI-compatible endpoint with your API key, specifying the model name and input text.

How does the 4B model compare to the 8B version?

The 4B model offers a balanced trade-off between performance and efficiency, requiring less memory and providing faster inference than the 8B model while still achieving strong results across embedding tasks.

call it

# OpenAI client - just change base_url
from openai import OpenAI
client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY)
v = client.embeddings.create(model="Qwen/Qwen3-Embedding-4B", input=["hello world"])
print(v.data[0].embedding[:4])

get a key + $25 free →model card ↗all models