MMLW E5 Base

sdadas/mmlw-e5-base

published Nov 2023 · updated Feb 2026

MMLW E5 Base is a Polish text embedding model that transforms texts into 768-dimensional vectors for tasks like semantic similarity, clustering, and information retrieval.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

369

license

apache-2.0

specs

Task	Text Embedding
Architecture	Distilled from multilingual E5 with BGE teacher
Parameters	Not specified
License	MIT (teacher model)

about this model

MMLW-e5-base is a Polish neural text encoder that transforms texts into 768-dimensional embeddings for tasks such as semantic similarity, clustering, and information retrieval. It is a distilled model initialized from the multilingual E5 checkpoint and further trained using multilingual knowledge distillation on 60 million Polish-English text pairs, with BAAI/bge-base-en as the teacher model. The distillation method follows the approach described in Reimers & Gurevych (EMNLP 2020).

The model requires specific prefixes: queries must be prefixed with "query: " and passages with "passage: ". It can also serve as a base for further fine-tuning.

Benchmark Results

Polish MTEB – Average Score of 59.71 (see MTEB Leaderboard).
Polish Information Retrieval Benchmark (PIRB) – NDCG@10 of 53.56 (see PIRB Leaderboard).

The model was trained with A100 GPU cluster support from the TASK center at Gdansk University of Technology.

best for

·Polish semantic similarity and clustering
·Polish information retrieval and search
·Polish text classification fine-tuning

FAQ

What is the output dimension of MMLW E5 Base?

It outputs 768-dimensional vectors.

What prefixes are required when encoding queries and passages?

Queries must be prefixed with "query: " and passages with "passage: ".

What is the average score on the Polish MTEB benchmark?

It achieves an average score of 59.71 on the Polish MTEB.

What license applies to this model?

The teacher model BAAI/bge-base-en is released under the MIT license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key, following the required query/passage prefixes.

not yet live

We're benchmarking and onboarding MMLW E5 Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5