skip to content
gigarouter gigarouter
models / reranker · coming soon

CrossEncoder CamemBERT Base mMARCO FR

antoinelouis/crossencoder-camembert-base-mmarcoFR

published Sep 2023 · updated Apr 2025

CrossEncoder CamemBERT Base mMARCO FR is a cross-encoder reranker for French that computes relevance scores between query-passage pairs for semantic search.

est. price
~$0.008
/ 1k docs · estimated, set at launch
API providers
0
downloads / mo
185K
license
mit

specs

TaskReranking (Cross-Encoder)
ArchitectureCamemBERT base cross-encoder
Parameters110.6M
LicenseMIT

about this model

crossencoder-camembert-base-mmarcoFR is a cross-encoder reranking model for French that performs cross-attention between a query-passage pair to output a relevance score between 0 and 1. It is designed to refine the results of a first-stage retrieval system such as BM25 or a dense bi-encoder by reordering passage candidates according to relevance.

Architecture and Training

The model is initialized from almanach/camembert-base and fine-tuned on French training samples from the mMARCO dataset — a machine-translated version of MS MARCO containing 8.8M passages and 539K training queries. Hard negatives are mined from 12 distinct dense retrievers to create 2.6M training triplets with a balanced positive-to-negative ratio of 1. Training uses binary cross-entropy loss (monoBERT-style) for 20k steps with a batch size of 128 and a learning rate of 2e-5; the maximum sequence length for concatenated query-passage pairs is 256 tokens.

Evaluation Results

The model is evaluated on the mMARCO-fr development set, which consists of 6,980 queries and 1,000 candidate passages per query (containing positives and ColBERTv2 hard negatives). Performance metrics are as follows:

MetricScore
MRR@1033.4
Recall@1059.83
Recall@10085.34

The model has 110.6M parameters and is released under the MIT license. As a specialist reranker, it is hosted by gigarouter as a managed, OpenAI-compatible API — no local installation or GPU required.

best for

FAQ

What is this model best used for?

It is best used as a reranker for French semantic search, to reorder passages retrieved by a first-stage retriever.

How does it compare in size and speed?

It has 110.6M parameters, similar to other CamemBERT base models, and runs inference on query-passage pairs up to 256 tokens.

What is the input/output format?

Input is a list of query-passage pairs; output are relevance scores between 0 and 1.

How to call it via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key; send POST requests with the model name and input pairs.

What license is it under?

MIT license, free for both academic and commercial use.

not yet live

We're benchmarking and onboarding CrossEncoder CamemBERT Base mMARCO FR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →