Qwen3 VL Reranker 8B

Qwen/Qwen3-VL-Reranker-8B

published Jan 2026 · updated Apr 2026

Qwen3 VL Reranker 8B is a multimodal rerank model that refines retrieval results by scoring query-document pairs, supporting text, images, screenshots, and video inputs.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

431K

license

apache-2.0

specs

Task	Multimodal Reranking
Architecture	Cross-encoder based on Qwen3-VL
Parameters	8B
License	Apache 2.0

about this model

Qwen3-VL-Reranker-8B is a multimodal reranking model that scores the relevance of query-document pairs, where both query and document may contain text, images, screenshots, video, or any combination thereof. Built on the Qwen3-VL foundation, it uses a cross-encoder architecture with cross-attention mechanisms to produce a precise relevance score, enabling fine-grained ranking in a two-stage retrieval pipeline (the embedding model performs initial recall; the reranker refines results).

Qwen3-VL-Reranker-8B model architecture overview

Key Strengths

High-precision reranking: Delivers state-of-the-art performance across image-text, video-text, visual document, and mixed-modal retrieval tasks.
Instruction-aware: Supports custom prompts; using tailored instructions typically improves scores by 1–5%. English instructions are recommended for best results.
Multilingual support: Explicitly supports 33 languages (including English, Chinese, Arabic, French, German, Japanese, Spanish, and others), inherited from Qwen3-VL.
32k context length and Apache 2.0 license.

Benchmark Performance

The 8B variant consistently outperforms the base embedding model and baseline rerankers across multiple benchmarks:

Model	Size	MMEB-v2 (Retrieval) Avg	MMEB-v2 Image	MMEB-v2 Video	MMEB-v2 VisDoc	MMTEB (Retrieval)	JinaVDR	ViDoRe v3
Qwen3-VL-Reranker-8B	8B	79.2	80.7	55.8	86.3	74.9	83.6	66.7

Results are from the MMEB-v2, MMTEB, JinaVDR, and ViDoRe v3 benchmarks. The model achieves an overall score of 79.2 on the MMEB-v2 retrieval average, ranking among the top-performing multimodal rerankers as of January 2025.

best for

·Refining image-text retrieval results in a two-stage search pipeline
·Re-ranking video-text matching candidates for higher accuracy
·Scoring mixed-modal document relevance (e.g., text + image) for enterprise search

FAQ

What input modalities does Qwen3 VL Reranker 8B support?

It supports text, images, screenshots, videos, and arbitrary combinations of these modalities, such as text + image or text + video.

How does the reranker compare to the embedding model in a retrieval pipeline?

The embedding model performs efficient initial recall, while the reranker refines results with precise cross-encoder scoring, significantly boosting final retrieval accuracy.

What is the context length and parameter count?

The model has 8 billion parameters and supports a context length of 32K tokens.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a query and documents as input to receive relevance scores.

Does the model support custom instructions for different tasks?

Yes, it is instruction-aware; you can provide a custom prompt to tailor scoring for specific tasks, which typically improves performance by 1% to 5%.

not yet live

We're benchmarking and onboarding Qwen3 VL Reranker 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo