MS Marco MiniLM-L4 V2

cross-encoder/ms-marco-MiniLM-L4-v2

published Mar 2022 · updated Aug 2025

MS Marco MiniLM-L4 V2 is a cross-encoder rerank model that scores query-passage pairs for relevance, trained on the MS MARCO Passage Ranking dataset.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

4.8M

license

apache-2.0

specs

Task	Reranking (Cross-Encoder)
Architecture	MiniLM (4 layers)
Training Dataset	MS MARCO Passage Ranking
NDCG@10 (TREC DL 2019)	73.04
Inference Speed	2500 docs/sec (V100 GPU)

about this model

cross-encoder/ms-marco-MiniLM-L4-v2 is a cross-encoder reranking model that scores query-passage pairs for information retrieval. It is trained on the MS MARCO Passage Ranking dataset, which contains over one million real, anonymized queries from Bing search logs and human-generated relevance judgments.

The model is designed to be used in a retrieve-and-rerank pipeline: an initial retrieval step (e.g., with ElasticSearch) returns a candidate set of passages, and this cross-encoder re-scores each query-passage pair to produce a refined ranking. The architecture uses a MiniLM-L4 Transformer (4 layers) with a cross-attention mechanism, enabling higher accuracy than single-vector bi-encoders while maintaining practical throughput.

Performance

The table below reports standard retrieval metrics on the TREC Deep Learning 2019 and MS MARCO Passage Reranking development datasets, along with throughput on a V100 GPU.

Model	NDCG@10 (TREC DL 19)	MRR@10 (MS MARCO Dev)	Docs / Sec
cross-encoder/ms-marco-MiniLM-L4-v2 (v2)	73.04	37.70	2500

The MiniLM-L4-v2 achieves a strong balance of accuracy and speed. On the TREC DL 2019 benchmark, its NDCG@10 of 73.04 places it within the top tier of compact reranking models, while its throughput of 2,500 documents per second on a V100 makes it suitable for latency-sensitive applications.

best for

·Reranking search engine results for improved relevance
·Scoring passage relevance for question answering systems

FAQ

What task is this model designed for?

It is a cross-encoder rerank model for information retrieval, scoring query-passage pairs.

How does this model compare to larger MiniLM-L12?

It is faster (2500 docs/sec on V100) but slightly less accurate, with NDCG@10 of 73.04 vs 74.31 for L12.

What is the input format?

Pairs of (query, passage) text strings.

What output does it produce?

A relevance score for each pair.

How can I use this model via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending query-passage pairs.

not yet live

We're benchmarking and onboarding MS Marco MiniLM-L4 V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo

mmarco-mMiniLMv2-L12-H384-v1

1.6M dl/mo