Udever BLOOM 1.1B

izhx/udever-bloom-1b1

published Oct 2023 · updated Nov 2023

Udever BLOOM 1.1B is an embed model that generates universal embeddings for multiple natural and programming languages, fine-tuned from BLOOM-1B1 via BitFit on MS MARCO, SNLI, and MultiNLI.

status

coming soon

API providers

downloads / mo

321

license

bigscience-bloom-rail-1.0

specs

Task	Text Embedding
Architecture	Decoder-only Transformer (BLOOM)
Parameters	1.1 billion
License	bigscience-bloom-rail-1.0
Training Data	MS MARCO Passage Ranking, SNLI, MultiNLI

about this model

Udever-bloom-1b1 is a universal embedding model finetuned from bigscience/bloom-1b1 via BitFit on MS MARCO Passage Ranking, SNLI, and MultiNLI data, designed to generate high-quality embeddings across tasks, natural languages, and programming languages. It is part of the Udever family, which extends the Language Models are Universal Embedders approach (presented at the XLLM Workshop, ACL 2025). The model uses a decoder-only Transformer architecture with contrastive loss and hard negatives, and training code is publicly available.

Diagram showing udever-bloom performance across tasks and languages

Benchmark Performance

On the Massive Text Embedding Benchmark (MTEB, 56 datasets), Udever-bloom-1b1 achieves an average score of 58.28, with strong results in classification (70.18), pair classification (83.11), and STS (81.52). On CodeSearchNet, it averages 80.90 across six programming languages (Go, Ruby, Python, Java, JavaScript, PHP). In Chinese multi-domain retrieval (Multi-cpr), it obtains an MRR@10 of 0.244 (E-commerce), 0.208 (Entertainment video), and 0.241 (Medical). The model handles languages and tasks not seen during fine-tuning, as demonstrated in the paper’s zero-shot evaluations.

Benchmark	Metric	Udever-bloom-1b1	Reference (OpenAI ada-002)
MTEB	Average	58.28	60.99
CodeSearchNet	Avg. MRR	80.90	–
Multi-cpr (E-com)	MRR@10	0.244	0.183

Additional checkpoints (560m, 3b, 7b1) are available on Hugging Face and ModelScope. The underlying BLOOM base model supports 46 languages including code. Per-dataset MTEB metrics are listed on the model’s Hugging Face page.

best for

·Multilingual semantic search and retrieval
·Code-to-code search and code retrieval
·Text classification and sentence similarity (STS)
·Cross-lingual embedding tasks

FAQ

What is Udever BLOOM 1.1B best for?

It is best for generating universal embeddings across tasks (retrieval, classification, reranking, STS) and languages, including natural and programming languages.

What is the model architecture?

It is a decoder-only Transformer based on bigscience/bloom-1b1, fine-tuned with BitFit.

What license does the model use?

The model is derived from BLOOM-1B1 and uses the bigscience-bloom-rail-1.0 license.

How do I call it via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. The model expects input with special tokens [BOQ]/[EOQ] for queries and [BOD]/[EOD] for documents.

What languages does it support?

It supports 46 languages (same as BLOOM-1B1) including English, Chinese, code, and many others. See the BLOOM training data for the full list.

not yet live

We're benchmarking and onboarding Udever BLOOM 1.1B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5