CrossEncoder CamemBERT Base mMARCO FR

antoinelouis/crossencoder-camembert-base-mmarcoFR

published Sep 2023 · updated Apr 2025

CrossEncoder CamemBERT Base mMARCO FR is a cross-encoder reranker for French that computes relevance scores between query-passage pairs for semantic search.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

185K

license

mit

specs

Task	Reranking (Cross-Encoder)
Architecture	CamemBERT base cross-encoder
Parameters	110.6M
License	MIT

about this model

crossencoder-camembert-base-mmarcoFR is a cross-encoder reranking model for French that performs cross-attention between a query-passage pair to output a relevance score between 0 and 1. It is designed to refine the results of a first-stage retrieval system such as BM25 or a dense bi-encoder by reordering passage candidates according to relevance.

Architecture and Training

The model is initialized from almanach/camembert-base and fine-tuned on French training samples from the mMARCO dataset — a machine-translated version of MS MARCO containing 8.8M passages and 539K training queries. Hard negatives are mined from 12 distinct dense retrievers to create 2.6M training triplets with a balanced positive-to-negative ratio of 1. Training uses binary cross-entropy loss (monoBERT-style) for 20k steps with a batch size of 128 and a learning rate of 2e-5; the maximum sequence length for concatenated query-passage pairs is 256 tokens.

Evaluation Results

The model is evaluated on the mMARCO-fr development set, which consists of 6,980 queries and 1,000 candidate passages per query (containing positives and ColBERTv2 hard negatives). Performance metrics are as follows:

Metric	Score
MRR@10	33.4
Recall@10	59.83
Recall@100	85.34

The model has 110.6M parameters and is released under the MIT license. As a specialist reranker, it is hosted by gigarouter as a managed, OpenAI-compatible API — no local installation or GPU required.

best for

·French semantic search reranking
·Improving BM25 or dense retrieval results for French queries
·Question-answering passage re-ranking

FAQ

What is this model best used for?

It is best used as a reranker for French semantic search, to reorder passages retrieved by a first-stage retriever.

How does it compare in size and speed?

It has 110.6M parameters, similar to other CamemBERT base models, and runs inference on query-passage pairs up to 256 tokens.

What is the input/output format?

Input is a list of query-passage pairs; output are relevance scores between 0 and 1.

How to call it via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key; send POST requests with the model name and input pairs.

What license is it under?

MIT license, free for both academic and commercial use.

not yet live

We're benchmarking and onboarding CrossEncoder CamemBERT Base mMARCO FR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo