MS Marco MiniLM L6 V2
cross-encoder/ms-marco-MiniLM-L6-v2
published Mar 2022 · updated Aug 2025
MS Marco MiniLM L6 V2 is a cross-encoder rerank model that scores query-passage pairs for information retrieval, trained on the MS Marco Passage Ranking dataset.
specs
| Task | Reranking / Cross-Encoder |
| Architecture | MiniLM-L6 |
about this model
cross-encoder/ms-marco-MiniLM-L6-v2 is a cross-encoder model trained on the MS Marco Passage Ranking dataset and designed for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved by a first-stage retriever), the model outputs relevance scores that allow the passages to be sorted by decreasing order of estimated relevance.
The model is a version 2 checkpoint using a MiniLM-L6 architecture, balancing accuracy and throughput. On the TREC Deep Learning 2019 benchmark it achieves an NDCG@10 of 74.30, and on the MS Marco Passage Reranking dev set it reaches an MRR@10 of 39.01. Runtime benchmarks on a V100 GPU indicate a throughput of approximately 1,800 documents per second.
Performance comparison (as reported in the original model card)
| Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec |
|---|---|---|---|
| cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 |
| cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 |
| cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 |
| cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 |
| cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 |
For additional context, runtime was measured on a V100 GPU. The model is hosted as a managed API through gigarouter, accepting standard inputs and returning relevance scores without requiring local installation or model loading.
best for
- ·Reranking search results from a first-stage retriever (e.g., ElasticSearch) to improve relevance
- ·Scoring query-passage pairs for information retrieval and question answering
FAQ
It is best for reranking passages in a retrieval pipeline: given a query and a set of candidate passages, it scores each pair to reorder results by relevance.
It processes about 1,800 docs per second on a V100 GPU, faster than larger models like MiniLM-L12-v2 (960 docs/sec) but slower than TinyBERT-L2 (9,000 docs/sec).
Input is a list of query-passage text pairs. Output is a relevance score (logit) for each pair; higher scores indicate greater relevance.
Use the OpenAI-compatible endpoint with your API key, passing query-passage pairs as input to the rerank endpoint.
The model is released under the MIT license.
# rerank documents by relevance; billed per document curl https://gigarouter.ai/v1/rerank \ -H "Authorization: Bearer $GR_KEY" \ -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France", "documents":["Paris is the capital of France.","Bananas are yellow."]}'
try it live
runs the real hosted model on a shared demo allowance · get your own key + $25 free →