skip to content
gigarouter gigarouter

MS Marco MiniLM L6 V2

cross-encoder/ms-marco-MiniLM-L6-v2

published Mar 2022 · updated Aug 2025

MS Marco MiniLM L6 V2 is a cross-encoder rerank model that scores query-passage pairs for information retrieval, trained on the MS Marco Passage Ranking dataset.

price
$0.008
/ 1k docs
API providers
0
downloads / mo
81.5M
throughput
2.7K docs/s
license
apache-2.0

specs

TaskReranking / Cross-Encoder
ArchitectureMiniLM-L6

about this model

cross-encoder/ms-marco-MiniLM-L6-v2 is a cross-encoder model trained on the MS Marco Passage Ranking dataset and designed for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved by a first-stage retriever), the model outputs relevance scores that allow the passages to be sorted by decreasing order of estimated relevance.

The model is a version 2 checkpoint using a MiniLM-L6 architecture, balancing accuracy and throughput. On the TREC Deep Learning 2019 benchmark it achieves an NDCG@10 of 74.30, and on the MS Marco Passage Reranking dev set it reaches an MRR@10 of 39.01. Runtime benchmarks on a V100 GPU indicate a throughput of approximately 1,800 documents per second.

Performance comparison (as reported in the original model card)

Model-Name NDCG@10 (TREC DL 19) MRR@10 (MS Marco Dev) Docs / Sec
cross-encoder/ms-marco-TinyBERT-L2-v2 69.84 32.56 9000
cross-encoder/ms-marco-MiniLM-L2-v2 71.01 34.85 4100
cross-encoder/ms-marco-MiniLM-L4-v2 73.04 37.70 2500
cross-encoder/ms-marco-MiniLM-L6-v2 74.30 39.01 1800
cross-encoder/ms-marco-MiniLM-L12-v2 74.31 39.02 960

For additional context, runtime was measured on a V100 GPU. The model is hosted as a managed API through gigarouter, accepting standard inputs and returning relevance scores without requiring local installation or model loading.

best for

FAQ

What is this model best used for?

It is best for reranking passages in a retrieval pipeline: given a query and a set of candidate passages, it scores each pair to reorder results by relevance.

How does it compare in speed to larger models?

It processes about 1,800 docs per second on a V100 GPU, faster than larger models like MiniLM-L12-v2 (960 docs/sec) but slower than TinyBERT-L2 (9,000 docs/sec).

What is the input and output format?

Input is a list of query-passage text pairs. Output is a relevance score (logit) for each pair; higher scores indicate greater relevance.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, passing query-passage pairs as input to the rerank endpoint.

What license is this model released under?

The model is released under the MIT license.

call it
# rerank documents by relevance; billed per document
curl https://gigarouter.ai/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France",
       "documents":["Paris is the capital of France.","Bananas are yellow."]}'

try it live

runs the real hosted model on a shared demo allowance · get your own key + $25 free →

related reranker models

compare all →