Udever BLOOM 3B

izhx/udever-bloom-3b

published Oct 2023 · updated Nov 2023

Udever BLOOM 3B is a universal embedding model finetuned from BLOOM-3B via BitFit on MS MARCO, SNLI, and MultiNLI for multilingual text and code embeddings.

status

coming soon

API providers

downloads / mo

206

license

bigscience-bloom-rail-1.0

specs

Task	Embedding
Architecture	Decoder-only Transformer (BLOOM-3B)
Parameters	3B
Languages	Multilingual (natural and programming)
Finetuning Method	BitFit
Training Data	MS MARCO, SNLI, MultiNLI

about this model

Udever-bloom-3b is an embedding model that produces universal text representations across natural and programming languages, finetuned from the BLOOM-3B decoder-only transformer via BitFit on MS MARCO Passage Ranking, SNLI, and MultiNLI data.

Developed by Alibaba Group, the model is designed to handle a wide range of embedding tasks—including classification, clustering, reranking, retrieval, and semantic textual similarity—without requiring task-specific fine-tuning. It supports multilingual input and extends to programming languages, as demonstrated on CodeSearchNet.

Benchmark Performance

On the Massive Text Embedding Benchmark (MTEB, 56 datasets), Udever-bloom-3b achieves an average score of 59.86. Key sub-task scores include classification 71.91, clustering 40.74, pair classification 84.06, reranking 54.90, retrieval 47.67, STS 82.37, and summarization 30.62.

On CodeSearchNet (code search across six languages), the model attains an average MRR of 82.29, with per-language scores of Go 80.63, Ruby 75.40, Python 98.02, Java 83.88, JavaScript 76.18, and PHP 79.67.

In Chinese multi-domain retrieval (Multi-cpr), the model achieves MRR@10 of 0.267 (E-commerce), 0.228 (Entertainment video), and 0.288 (Medical), with Recall@1k of 0.871, 0.836, and 0.619 respectively.

Comparison of Udever-bloom-3b with other embedders on MTEB and CodeSearchNet

The model is hosted as a managed API on gigarouter, providing OpenAI-compatible endpoint access. The underlying research, Language Models are Universal Embedders, was accepted to the XLLM Workshop at ACL 2025.

best for

·Semantic search and retrieval across multiple languages
·Code search and retrieval
·Text classification and clustering

FAQ

What is Udever BLOOM 3B best used for?

It excels at embedding multilingual text and code for tasks like retrieval, classification, and clustering across natural and programming languages.

How many parameters does it have?

3 billion parameters.

Is it multilingual?

Yes, it supports multiple natural languages and programming languages, trained on BLOOM's multilingual corpus.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key. The service handles tokenization and special tokens automatically.

What is the input format?

The model expects a query or document with special prefix tokens ([BOQ] for query, [BOD] for document) and suffix tokens ([EOQ], [EOD]). The gigarouter API manages these automatically.

not yet live

We're benchmarking and onboarding Udever BLOOM 3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5