GTE Multilingual Reranker Base

Alibaba-NLP/gte-multilingual-reranker-base

published Jul 2024 · updated Jul 2025

GTE Multilingual Reranker Base is a rerank model that achieves state-of-the-art multilingual retrieval performance with a fast encoder-only transformer architecture supporting up to 8192 tokens and over 70 languages.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

221.9K

license

apache-2.0

specs

Task	Reranking
Architecture	Encoder-only Transformer
Parameters	306M
Max Input Tokens	8192

about this model

The gte-multilingual-reranker-base is a cross-encoder reranking model from the GTE family, designed to reorder document lists by relevance to a given query. It is built on an encoder-only transformer architecture (306M parameters) and supports input sequences up to 8,192 tokens, enabling long-context retrieval tasks. The model covers over 70 languages.

Performance and Efficiency

The model achieves state-of-the-art results on multilingual retrieval benchmarks among rerankers of comparable size. According to the accompanying paper (EMNLP 2024 Industry Track), the reranker matches the performance of the larger BGE-M3 models and surpasses them on long-context retrieval benchmarks. Its encoder-only design provides an approximately 10x inference speed advantage over decoder-only LLM-based rerankers (e.g., gte-qwen2-1.5b-instruct) and requires lower hardware resources.

Architecture and Training

The underlying text encoder is pre-trained with a native 8,192-token context (compared to 512 tokens for previous multilingual encoders like XLM-R) and enhanced with Rotary Position Embedding (RoPE) and unpadding optimization. The reranker is trained via contrastive learning on a hybrid of text representation and cross-encoder objectives.

Benchmark Results

Evaluation on multiple text retrieval datasets demonstrates the model's effectiveness. Detailed experimental results are available in the paper.

Reranking evaluation results on multiple text retrieval datasets

This model is hosted as a managed, OpenAI-compatible API on gigarouter, requiring no local infrastructure or model loading.

best for

·Multilingual document retrieval reranking
·Long-context search (up to 8192 tokens)
·Cross-lingual question answering

FAQ

What is the maximum input length?

8192 tokens.

How many languages does it support?

Over 70 languages.

How does it compare to BGE-M3?

It matches the performance of large-sized BGE-M3 models and achieves better results on long-context retrieval benchmarks.

How can I call this model via gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key.

Is it faster than decoder-only rerankers?

Yes, its encoder-only architecture provides roughly 10x inference speed increase compared to decoder-only models.

not yet live

We're benchmarking and onboarding GTE Multilingual Reranker Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo