MS Marco MiniLM-L4 V2
cross-encoder/ms-marco-MiniLM-L4-v2
published Mar 2022 · updated Aug 2025
MS Marco MiniLM-L4 V2 is a cross-encoder rerank model that scores query-passage pairs for relevance, trained on the MS MARCO Passage Ranking dataset.
specs
| Task | Reranking (Cross-Encoder) |
| Architecture | MiniLM (4 layers) |
| Training Dataset | MS MARCO Passage Ranking |
| NDCG@10 (TREC DL 2019) | 73.04 |
| Inference Speed | 2500 docs/sec (V100 GPU) |
about this model
cross-encoder/ms-marco-MiniLM-L4-v2 is a cross-encoder reranking model that scores query-passage pairs for information retrieval. It is trained on the MS MARCO Passage Ranking dataset, which contains over one million real, anonymized queries from Bing search logs and human-generated relevance judgments.
The model is designed to be used in a retrieve-and-rerank pipeline: an initial retrieval step (e.g., with ElasticSearch) returns a candidate set of passages, and this cross-encoder re-scores each query-passage pair to produce a refined ranking. The architecture uses a MiniLM-L4 Transformer (4 layers) with a cross-attention mechanism, enabling higher accuracy than single-vector bi-encoders while maintaining practical throughput.
Performance
The table below reports standard retrieval metrics on the TREC Deep Learning 2019 and MS MARCO Passage Reranking development datasets, along with throughput on a V100 GPU.
| Model | NDCG@10 (TREC DL 19) | MRR@10 (MS MARCO Dev) | Docs / Sec |
|---|---|---|---|
| cross-encoder/ms-marco-MiniLM-L4-v2 (v2) | 73.04 | 37.70 | 2500 |
The MiniLM-L4-v2 achieves a strong balance of accuracy and speed. On the TREC DL 2019 benchmark, its NDCG@10 of 73.04 places it within the top tier of compact reranking models, while its throughput of 2,500 documents per second on a V100 makes it suitable for latency-sensitive applications.
best for
- ·Reranking search engine results for improved relevance
- ·Scoring passage relevance for question answering systems
FAQ
It is a cross-encoder rerank model for information retrieval, scoring query-passage pairs.
It is faster (2500 docs/sec on V100) but slightly less accurate, with NDCG@10 of 73.04 vs 74.31 for L12.
Pairs of (query, passage) text strings.
A relevance score for each pair.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending query-passage pairs.
We're benchmarking and onboarding MS Marco MiniLM-L4 V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.