skip to content
gigarouter gigarouter
models / reranker · coming soon

Qwen3 VL Reranker 2B

Qwen/Qwen3-VL-Reranker-2B

published Jan 2026 · updated Apr 2026

Qwen3 VL Reranker 2B is a cross-encoder reranker that accepts query-document pairs comprising text, images, screenshots, video, or mixed modalities and outputs a precise relevance score.

est. price
~$0.008
/ 1k docs · estimated, set at launch
API providers
0
downloads / mo
300.3K
license
apache-2.0

specs

TaskMultimodal Reranking
ArchitectureCross-encoder with cross-attention
Parameters2B
LicenseApache 2.0
Context Length32K tokens

about this model

Qwen3-VL-Reranker-2B is a multimodal reranking model that refines retrieval results by computing precise relevance scores for query-document pairs, supporting text, images, screenshots, video, and mixed-modality inputs.

Built on the Qwen3-VL foundation, the model uses a cross-encoder architecture with cross-attention mechanisms to estimate fine-grained relevance. It supports over 30 languages and handles inputs up to 32K tokens. The model is instruction-aware: customizing the prompt for the specific task typically improves performance by 1% to 5% compared to no instruction. While commonly paired with an embedding model in a two-stage retrieval pipeline for initial recall then reranking, it can also be used standalone.

Benchmark Metric Score
MMEB-v2 (Retrieval) – AvgAverage75.1
MMEB-v2 (Retrieval) – ImageImage retrieval73.8
MMEB-v2 (Retrieval) – VideoVideo retrieval52.1
MMEB-v2 (Retrieval) – VisDocVisual document retrieval83.4
MMTEB (Retrieval)Text retrieval70.0
JinaVDRVisual document retrieval80.9
ViDoRe v3Visual document retrieval60.8

Across all evaluated benchmarks, Qwen3-VL-Reranker-2B consistently outperforms the base embedding model and baseline rerankers of comparable size. The larger 8B variant achieves the best overall results (79.2 on MMEB-v2 Avg, 86.3 on VisDoc).

Architecture diagram of Qwen3-VL-Embedding and Reranker models showing multimodal input processing and two-stage retrieval pipeline.

best for

FAQ

What is Qwen3 VL Reranker 2B best for?

It is best for re-ranking initial retrieval results by scoring relevance between a query (text, image, or video) and candidate documents (text, image, or video), significantly improving retrieval accuracy in multimodal pipelines.

How does it compare to the Qwen3 VL Embedding model?

The Embedding model generates vector embeddings for efficient first-stage recall, while the Reranker performs fine-grained cross-encoder scoring on the retrieved candidates. Used together, they form a high-accuracy two-stage retrieval pipeline.

What input formats does the model support?

The model accepts pairs of queries and documents, where each can be plain text, an image URL, a video, or a mix of text and image. It supports over 30 languages and a context length of 32K tokens.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a request with the query and documents as inputs. The response will include relevance scores for each document.

What license is the model released under?

It is released under the Apache 2.0 license, allowing commercial use and modification.

not yet live

We're benchmarking and onboarding Qwen3 VL Reranker 2B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →