Qwen3 Reranker 0.6B
ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
published Oct 2025 · updated Oct 2025
Qwen3 Reranker 0.6B is a multilingual reranker model that scores document-query relevance with instruction awareness, based on the Qwen3-0.6B-Base architecture.
specs
| Task | Text ranking (reranking) |
| Architecture | Transformer with 28 layers |
| Parameters | 0.6B |
| License | Apache-2.0 |
| Context Length | 32K tokens |
about this model
Qwen3-Reranker-0.6B is a cross-encoder reranking model that scores the relevance of query-document pairs, enabling more precise second-stage retrieval over a first-stage retriever. It is built on the Qwen3-0.6B-Base architecture and supports a context length of 32K tokens across 100+ languages.
Key Capabilities
- Instruction Aware: The model accepts custom instructions to guide relevance scoring. Using instructions typically yields a 1%–5% improvement in reranking accuracy.
- Multilingual: Supports over 100 languages, making it suitable for global search and retrieval pipelines.
- Long-context: Handles up to 32K tokens per query-document pair.
Architecture
The model has 28 layers and is a cross-encoder (no embedding dimension). It is the 0.6B-parameter variant in the Qwen3 reranker series, which also includes 4B and 8B versions. The model is instruction-aware, meaning custom reranking instructions can be provided to improve relevance scoring by 1%–5%.
Benchmark Context
The Qwen3-Embedding 8B model ranks No.1 on the MTEB multilingual leaderboard (score 70.58 as of June 5, 2025). The reranker series is designed to complement these embedding models in a two-stage retrieval pipeline.
License
Apache-2.0
best for
- ·Multilingual document retrieval reranking
- ·Question-answering relevance scoring
- ·Search result reordering with custom instructions
FAQ
It is designed to rerank a set of candidate documents for a given query, improving retrieval accuracy. It supports 100+ languages and can leverage custom instructions for up to 5% better performance.
The 0.6B variant is faster and more lightweight, suitable for latency-sensitive applications, while larger versions offer higher accuracy at the cost of speed.
The model is released under the Apache-2.0 license, allowing commercial and personal use with attribution.
It accepts query-document pairs, typically as strings. Using SentenceTransformers CrossEncoder, you pass (query, document) and receive a relevance score. It can also process batched inputs.
Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to the gigarouter documentation for the exact endpoint and payload format.
We're benchmarking and onboarding Qwen3 Reranker 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.