skip to content
gigarouter gigarouter
models / embeddings · coming soon

Udever BLOOM 3B

izhx/udever-bloom-3b

published Oct 2023 · updated Nov 2023

Udever BLOOM 3B is a universal embedding model finetuned from BLOOM-3B via BitFit on MS MARCO, SNLI, and MultiNLI for multilingual text and code embeddings.

status
coming soon
API providers
0
downloads / mo
206
license
bigscience-bloom-rail-1.0

specs

TaskEmbedding
ArchitectureDecoder-only Transformer (BLOOM-3B)
Parameters3B
LanguagesMultilingual (natural and programming)
Finetuning MethodBitFit
Training DataMS MARCO, SNLI, MultiNLI

about this model

Udever-bloom-3b is an embedding model that produces universal text representations across natural and programming languages, finetuned from the BLOOM-3B decoder-only transformer via BitFit on MS MARCO Passage Ranking, SNLI, and MultiNLI data.

Developed by Alibaba Group, the model is designed to handle a wide range of embedding tasks—including classification, clustering, reranking, retrieval, and semantic textual similarity—without requiring task-specific fine-tuning. It supports multilingual input and extends to programming languages, as demonstrated on CodeSearchNet.

Benchmark Performance

On the Massive Text Embedding Benchmark (MTEB, 56 datasets), Udever-bloom-3b achieves an average score of 59.86. Key sub-task scores include classification 71.91, clustering 40.74, pair classification 84.06, reranking 54.90, retrieval 47.67, STS 82.37, and summarization 30.62.

On CodeSearchNet (code search across six languages), the model attains an average MRR of 82.29, with per-language scores of Go 80.63, Ruby 75.40, Python 98.02, Java 83.88, JavaScript 76.18, and PHP 79.67.

In Chinese multi-domain retrieval (Multi-cpr), the model achieves MRR@10 of 0.267 (E-commerce), 0.228 (Entertainment video), and 0.288 (Medical), with Recall@1k of 0.871, 0.836, and 0.619 respectively.

Comparison of Udever-bloom-3b with other embedders on MTEB and CodeSearchNet

The model is hosted as a managed API on gigarouter, providing OpenAI-compatible endpoint access. The underlying research, Language Models are Universal Embedders, was accepted to the XLLM Workshop at ACL 2025.

best for

FAQ

What is Udever BLOOM 3B best used for?

It excels at embedding multilingual text and code for tasks like retrieval, classification, and clustering across natural and programming languages.

How many parameters does it have?

3 billion parameters.

Is it multilingual?

Yes, it supports multiple natural languages and programming languages, trained on BLOOM's multilingual corpus.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. The service handles tokenization and special tokens automatically.

What is the input format?

The model expects a query or document with special prefix tokens ([BOQ] for query, [BOD] for document) and suffix tokens ([EOQ], [EOD]). The gigarouter API manages these automatically.

not yet live

We're benchmarking and onboarding Udever BLOOM 3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →