MS MARCO MiniLM-L12 v2
cross-encoder/ms-marco-MiniLM-L12-v2
published Mar 2022 · updated Aug 2025
MS MARCO MiniLM-L12 v2 is a cross-encoder reranker model that scores query-passage pairs for information retrieval.
specs
| Task | Passage Re-ranking (Cross-Encoder) |
| Architecture | MiniLM-L12 (12-layer transformer) |
| Training Data | MS MARCO Passage Ranking |
| License | Not specified |
about this model
cross-encoder/ms-marco-MiniLM-L12-v2 is a cross-encoder reranking model that scores the relevance of a query–passage pair, enabling a second-stage reordering of candidate documents retrieved by a first-stage retrieval system.
Model Description
This model is a 12‑layer MiniLM cross‑encoder fine‑tuned on the MS Marco Passage Ranking dataset. It accepts a query and a passage as input and outputs a relevance score. In a typical retrieve‑and‑rerank pipeline, a fast first‑stage retriever (e.g., Elasticsearch or a dense retriever) returns a set of candidate passages; the cross‑encoder then re‑scores a subset of those candidates to improve ranking precision.
Key Strengths
- Strong ranking accuracy on standard benchmarks
- Balanced trade‑off between latency and quality – ~960 documents per second on a V100 GPU
- Part of the v2 family of MS Marco cross‑encoders, which outperform their v1 counterparts
Benchmark Performance
| Dataset | Metric | Score |
|---|---|---|
| TREC Deep Learning 2019 | NDCG@10 | 74.31 |
| MS Marco Passage Dev | MRR@10 | 39.02 |
Additional Context
The model was trained on over one million real anonymized Bing queries from the MS Marco dataset. It is designed for English‑language passage reranking and is hosted as a managed API on Gigarouter, requiring no local infrastructure or dependency installation.
best for
- ·Re-ranking top-k results from an initial retrieval system (e.g., BM25)
- ·Improving relevance scoring in enterprise search or document retrieval pipelines
- ·Building a question-answering system with passage selection from a candidate set
FAQ
It expects a pair of texts (query and passage) and returns a relevance score. Example: model.predict([("query", "passage")])
It is the largest MiniLM v2 variant (12 layers) and achieves the highest MRR@10 (39.02) on MS MARCO Dev, but processes fewer documents per second (960 on V100) compared to smaller variants.
The model card does not specify a license. Please check the model repository for any license information.
Send a POST request to the gigarouter OpenAI-compatible endpoint with your API key, including the model name "cross-encoder/ms-marco-MiniLM-L12-v2" and a list of query-passage pairs.
It was trained on the MS MARCO Passage Ranking dataset, which contains over 1 million real Bing queries and human-generated answers.
We're benchmarking and onboarding MS MARCO MiniLM-L12 v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.