skip to content
gigarouter gigarouter
models / reranker · coming soon

Qwen3 Reranker 0.6B

ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF

published Oct 2025 · updated Oct 2025

Qwen3 Reranker 0.6B is a multilingual reranker model that scores document-query relevance with instruction awareness, based on the Qwen3-0.6B-Base architecture.

status
coming soon
API providers
0
downloads / mo
32.7K
license
apache-2.0

specs

TaskText ranking (reranking)
ArchitectureTransformer with 28 layers
Parameters0.6B
LicenseApache-2.0
Context Length32K tokens

about this model

Qwen3-Reranker-0.6B is a cross-encoder reranking model that scores the relevance of query-document pairs, enabling more precise second-stage retrieval over a first-stage retriever. It is built on the Qwen3-0.6B-Base architecture and supports a context length of 32K tokens across 100+ languages.

Key Capabilities

  • Instruction Aware: The model accepts custom instructions to guide relevance scoring. Using instructions typically yields a 1%–5% improvement in reranking accuracy.
  • Multilingual: Supports over 100 languages, making it suitable for global search and retrieval pipelines.
  • Long-context: Handles up to 32K tokens per query-document pair.

Architecture

The model has 28 layers and is a cross-encoder (no embedding dimension). It is the 0.6B-parameter variant in the Qwen3 reranker series, which also includes 4B and 8B versions. The model is instruction-aware, meaning custom reranking instructions can be provided to improve relevance scoring by 1%–5%.

Benchmark Context

The Qwen3-Embedding 8B model ranks No.1 on the MTEB multilingual leaderboard (score 70.58 as of June 5, 2025). The reranker series is designed to complement these embedding models in a two-stage retrieval pipeline.

License

Apache-2.0

best for

FAQ

What is this model best used for?

It is designed to rerank a set of candidate documents for a given query, improving retrieval accuracy. It supports 100+ languages and can leverage custom instructions for up to 5% better performance.

How does the 0.6B size compare to the 4B or 8B versions?

The 0.6B variant is faster and more lightweight, suitable for latency-sensitive applications, while larger versions offer higher accuracy at the cost of speed.

What is the license for this model?

The model is released under the Apache-2.0 license, allowing commercial and personal use with attribution.

What input format does the model expect?

It accepts query-document pairs, typically as strings. Using SentenceTransformers CrossEncoder, you pass (query, document) and receive a relevance score. It can also process batched inputs.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to the gigarouter documentation for the exact endpoint and payload format.

not yet live

We're benchmarking and onboarding Qwen3 Reranker 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →