skip to content
gigarouter gigarouter
models / reranker · coming soon

MS Marco MiniLM-L4 V2

cross-encoder/ms-marco-MiniLM-L4-v2

published Mar 2022 · updated Aug 2025

MS Marco MiniLM-L4 V2 is a cross-encoder rerank model that scores query-passage pairs for relevance, trained on the MS MARCO Passage Ranking dataset.

est. price
~$0.008
/ 1k docs · estimated, set at launch
API providers
0
downloads / mo
4.8M
license
apache-2.0

specs

TaskReranking (Cross-Encoder)
ArchitectureMiniLM (4 layers)
Training DatasetMS MARCO Passage Ranking
NDCG@10 (TREC DL 2019)73.04
Inference Speed2500 docs/sec (V100 GPU)

about this model

cross-encoder/ms-marco-MiniLM-L4-v2 is a cross-encoder reranking model that scores query-passage pairs for information retrieval. It is trained on the MS MARCO Passage Ranking dataset, which contains over one million real, anonymized queries from Bing search logs and human-generated relevance judgments.

The model is designed to be used in a retrieve-and-rerank pipeline: an initial retrieval step (e.g., with ElasticSearch) returns a candidate set of passages, and this cross-encoder re-scores each query-passage pair to produce a refined ranking. The architecture uses a MiniLM-L4 Transformer (4 layers) with a cross-attention mechanism, enabling higher accuracy than single-vector bi-encoders while maintaining practical throughput.

Performance

The table below reports standard retrieval metrics on the TREC Deep Learning 2019 and MS MARCO Passage Reranking development datasets, along with throughput on a V100 GPU.

Model NDCG@10 (TREC DL 19) MRR@10 (MS MARCO Dev) Docs / Sec
cross-encoder/ms-marco-MiniLM-L4-v2 (v2) 73.04 37.70 2500

The MiniLM-L4-v2 achieves a strong balance of accuracy and speed. On the TREC DL 2019 benchmark, its NDCG@10 of 73.04 places it within the top tier of compact reranking models, while its throughput of 2,500 documents per second on a V100 makes it suitable for latency-sensitive applications.

best for

FAQ

What task is this model designed for?

It is a cross-encoder rerank model for information retrieval, scoring query-passage pairs.

How does this model compare to larger MiniLM-L12?

It is faster (2500 docs/sec on V100) but slightly less accurate, with NDCG@10 of 73.04 vs 74.31 for L12.

What is the input format?

Pairs of (query, passage) text strings.

What output does it produce?

A relevance score for each pair.

How can I use this model via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending query-passage pairs.

not yet live

We're benchmarking and onboarding MS Marco MiniLM-L4 V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →