Qwen3 VL Reranker 8B
Qwen/Qwen3-VL-Reranker-8B
published Jan 2026 · updated Apr 2026
Qwen3 VL Reranker 8B is a multimodal rerank model that refines retrieval results by scoring query-document pairs, supporting text, images, screenshots, and video inputs.
specs
| Task | Multimodal Reranking |
| Architecture | Cross-encoder based on Qwen3-VL |
| Parameters | 8B |
| License | Apache 2.0 |
about this model
Qwen3-VL-Reranker-8B is a multimodal reranking model that scores the relevance of query-document pairs, where both query and document may contain text, images, screenshots, video, or any combination thereof. Built on the Qwen3-VL foundation, it uses a cross-encoder architecture with cross-attention mechanisms to produce a precise relevance score, enabling fine-grained ranking in a two-stage retrieval pipeline (the embedding model performs initial recall; the reranker refines results).
Key Strengths
- High-precision reranking: Delivers state-of-the-art performance across image-text, video-text, visual document, and mixed-modal retrieval tasks.
- Instruction-aware: Supports custom prompts; using tailored instructions typically improves scores by 1–5%. English instructions are recommended for best results.
- Multilingual support: Explicitly supports 33 languages (including English, Chinese, Arabic, French, German, Japanese, Spanish, and others), inherited from Qwen3-VL.
- 32k context length and Apache 2.0 license.
Benchmark Performance
The 8B variant consistently outperforms the base embedding model and baseline rerankers across multiple benchmarks:
| Model | Size | MMEB-v2 (Retrieval) Avg | MMEB-v2 Image | MMEB-v2 Video | MMEB-v2 VisDoc | MMTEB (Retrieval) | JinaVDR | ViDoRe v3 |
|---|---|---|---|---|---|---|---|---|
| Qwen3-VL-Reranker-8B | 8B | 79.2 | 80.7 | 55.8 | 86.3 | 74.9 | 83.6 | 66.7 |
Results are from the MMEB-v2, MMTEB, JinaVDR, and ViDoRe v3 benchmarks. The model achieves an overall score of 79.2 on the MMEB-v2 retrieval average, ranking among the top-performing multimodal rerankers as of January 2025.
best for
- ·Refining image-text retrieval results in a two-stage search pipeline
- ·Re-ranking video-text matching candidates for higher accuracy
- ·Scoring mixed-modal document relevance (e.g., text + image) for enterprise search
FAQ
It supports text, images, screenshots, videos, and arbitrary combinations of these modalities, such as text + image or text + video.
The embedding model performs efficient initial recall, while the reranker refines results with precise cross-encoder scoring, significantly boosting final retrieval accuracy.
The model has 8 billion parameters and supports a context length of 32K tokens.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending a query and documents as input to receive relevance scores.
Yes, it is instruction-aware; you can provide a custom prompt to tailor scoring for specific tasks, which typically improves performance by 1% to 5%.
We're benchmarking and onboarding Qwen3 VL Reranker 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.