CrossEncoder CamemBERT Base mMARCO FR
antoinelouis/crossencoder-camembert-base-mmarcoFR
published Sep 2023 · updated Apr 2025
CrossEncoder CamemBERT Base mMARCO FR is a cross-encoder reranker for French that computes relevance scores between query-passage pairs for semantic search.
specs
| Task | Reranking (Cross-Encoder) |
| Architecture | CamemBERT base cross-encoder |
| Parameters | 110.6M |
| License | MIT |
about this model
crossencoder-camembert-base-mmarcoFR is a cross-encoder reranking model for French that performs cross-attention between a query-passage pair to output a relevance score between 0 and 1. It is designed to refine the results of a first-stage retrieval system such as BM25 or a dense bi-encoder by reordering passage candidates according to relevance.
Architecture and Training
The model is initialized from almanach/camembert-base and fine-tuned on French training samples from the mMARCO dataset — a machine-translated version of MS MARCO containing 8.8M passages and 539K training queries. Hard negatives are mined from 12 distinct dense retrievers to create 2.6M training triplets with a balanced positive-to-negative ratio of 1. Training uses binary cross-entropy loss (monoBERT-style) for 20k steps with a batch size of 128 and a learning rate of 2e-5; the maximum sequence length for concatenated query-passage pairs is 256 tokens.
Evaluation Results
The model is evaluated on the mMARCO-fr development set, which consists of 6,980 queries and 1,000 candidate passages per query (containing positives and ColBERTv2 hard negatives). Performance metrics are as follows:
| Metric | Score |
|---|---|
| MRR@10 | 33.4 |
| Recall@10 | 59.83 |
| Recall@100 | 85.34 |
The model has 110.6M parameters and is released under the MIT license. As a specialist reranker, it is hosted by gigarouter as a managed, OpenAI-compatible API — no local installation or GPU required.
best for
- ·French semantic search reranking
- ·Improving BM25 or dense retrieval results for French queries
- ·Question-answering passage re-ranking
FAQ
It is best used as a reranker for French semantic search, to reorder passages retrieved by a first-stage retriever.
It has 110.6M parameters, similar to other CamemBERT base models, and runs inference on query-passage pairs up to 256 tokens.
Input is a list of query-passage pairs; output are relevance scores between 0 and 1.
Use the gigarouter OpenAI-compatible endpoint with an API key; send POST requests with the model name and input pairs.
MIT license, free for both academic and commercial use.
We're benchmarking and onboarding CrossEncoder CamemBERT Base mMARCO FR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.