Contriever
facebook/contriever
published Mar 2022 · updated Jan 2022
Contriever is an unsupervised dense information retrieval model trained with contrastive learning for zero-shot passage and document search.
specs
| Task | Dense Retrieval |
| Architecture | BERT-based transformer encoder |
| Training Data | CC-net and English Wikipedia (unsupervised) |
about this model
facebook/contriever is an unsupervised dense information retrieval model trained with contrastive learning, as described in the paper "Towards Unsupervised Dense Information Retrieval with Contrastive Learning" (arXiv:2112.09118). It produces sentence embeddings that can be compared via dot product for retrieval tasks, without requiring any supervised training data.
Key strengths
The unsupervised Contriever is competitive with BM25 on the BEIR benchmark. On Recall@100, it outperforms BM25 on 11 out of 15 datasets. When fine-tuned on MS MARCO (contriever-msmarco), retrieval recall improves substantially. A multilingual version, mcontriever, is pre-trained on 29 languages using CC-net data and supports cross-lingual retrieval across different scripts.
Benchmark results
Performance on NaturalQuestions (R@k):
| Model | R@5 | R@20 | R@100 |
|---|---|---|---|
| Contriever | 47.8 | 67.8 | 82.1 |
| Contriever-msmarco | 65.7 | 79.6 | 88.0 |
Performance on TriviaQA (R@k):
| Model | R@5 | R@20 | R@100 |
|---|---|---|---|
| Contriever | 59.4 | 67.8 | 83.2 |
| Contriever-msmarco | 71.3 | 80.4 | 85.7 |
BEIR evaluation additionally uses nDCG@10 across datasets including MS MARCO, TREC-Covid, NFCorpus, and others. Pre-computed Wikipedia passage embeddings for both Contriever and Contriever-msmarco are available for download.
Model variants hosted by Gigarouter
Four pre-trained variants are available: the unsupervised contriever, contriever-msmarco (fine-tuned on MS MARCO), mcontriever (multilingual, 29 languages), and mcontriever-msmarco. All are accessible via the gigarouter API as an OpenAI-compatible endpoint, requiring no local installation or pooling logic.
best for
- ·Zero-shot passage retrieval for open-domain question answering
- ·Cross-lingual retrieval when combined with multilingual variants (mContriever)
- ·Document similarity search in domains with no labeled training data
FAQ
Contriever performs dense retrieval – it maps queries and passages to dense vectors and retrieves the most relevant passages by dot-product similarity.
On the BEIR benchmark, unsupervised Contriever outperforms BM25 on 11 out of 15 datasets for Recall@100, especially in zero-shot settings.
The model accepts text strings (queries or passages). Use the HuggingFace tokenizer with padding and truncation, then apply mean pooling to obtain sentence embeddings.
Use the OpenAI‑compatible endpoint with your gigarouter API key. Send a request with the model name and input text; the API returns the embeddings.
Yes, Contriever‑msmarco is fine‑tuned on MS MARCO for better retrieval on that domain. Multilingual mContriever and mContriever‑msmarco are also available.
We're benchmarking and onboarding Contriever as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.