MS Marco MiniLM L6 V2

cross-encoder/ms-marco-MiniLM-L6-v2

published Mar 2022 · updated Aug 2025

MS Marco MiniLM L6 V2 is a cross-encoder rerank model that scores query-passage pairs for information retrieval, trained on the MS Marco Passage Ranking dataset.

price

$0.008

/ 1k docs

API providers

downloads / mo

81.5M

throughput

2.7K docs/s

license

apache-2.0

specs

Task	Reranking / Cross-Encoder
Architecture	MiniLM-L6

about this model

cross-encoder/ms-marco-MiniLM-L6-v2 is a cross-encoder model trained on the MS Marco Passage Ranking dataset and designed for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved by a first-stage retriever), the model outputs relevance scores that allow the passages to be sorted by decreasing order of estimated relevance.

The model is a version 2 checkpoint using a MiniLM-L6 architecture, balancing accuracy and throughput. On the TREC Deep Learning 2019 benchmark it achieves an NDCG@10 of 74.30, and on the MS Marco Passage Reranking dev set it reaches an MRR@10 of 39.01. Runtime benchmarks on a V100 GPU indicate a throughput of approximately 1,800 documents per second.

Performance comparison (as reported in the original model card)

Model-Name	NDCG@10 (TREC DL 19)	MRR@10 (MS Marco Dev)	Docs / Sec
cross-encoder/ms-marco-TinyBERT-L2-v2	69.84	32.56	9000
cross-encoder/ms-marco-MiniLM-L2-v2	71.01	34.85	4100
cross-encoder/ms-marco-MiniLM-L4-v2	73.04	37.70	2500
cross-encoder/ms-marco-MiniLM-L6-v2	74.30	39.01	1800
cross-encoder/ms-marco-MiniLM-L12-v2	74.31	39.02	960

For additional context, runtime was measured on a V100 GPU. The model is hosted as a managed API through gigarouter, accepting standard inputs and returning relevance scores without requiring local installation or model loading.

best for

·Reranking search results from a first-stage retriever (e.g., ElasticSearch) to improve relevance
·Scoring query-passage pairs for information retrieval and question answering

FAQ

What is this model best used for?

It is best for reranking passages in a retrieval pipeline: given a query and a set of candidate passages, it scores each pair to reorder results by relevance.

How does it compare in speed to larger models?

It processes about 1,800 docs per second on a V100 GPU, faster than larger models like MiniLM-L12-v2 (960 docs/sec) but slower than TinyBERT-L2 (9,000 docs/sec).

What is the input and output format?

Input is a list of query-passage text pairs. Output is a relevance score (logit) for each pair; higher scores indicate greater relevance.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, passing query-passage pairs as input to the rerank endpoint.

What license is this model released under?

The model is released under the MIT license.

call it

# rerank documents by relevance; billed per document
curl https://gigarouter.ai/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France",
       "documents":["Paris is the capital of France.","Bananas are yellow."]}'

get a key + $25 free →model card ↗all models