Udever BLOOM 3B
izhx/udever-bloom-3b
published Oct 2023 · updated Nov 2023
Udever BLOOM 3B is a universal embedding model finetuned from BLOOM-3B via BitFit on MS MARCO, SNLI, and MultiNLI for multilingual text and code embeddings.
specs
| Task | Embedding |
| Architecture | Decoder-only Transformer (BLOOM-3B) |
| Parameters | 3B |
| Languages | Multilingual (natural and programming) |
| Finetuning Method | BitFit |
| Training Data | MS MARCO, SNLI, MultiNLI |
about this model
Udever-bloom-3b is an embedding model that produces universal text representations across natural and programming languages, finetuned from the BLOOM-3B decoder-only transformer via BitFit on MS MARCO Passage Ranking, SNLI, and MultiNLI data.
Developed by Alibaba Group, the model is designed to handle a wide range of embedding tasks—including classification, clustering, reranking, retrieval, and semantic textual similarity—without requiring task-specific fine-tuning. It supports multilingual input and extends to programming languages, as demonstrated on CodeSearchNet.
Benchmark Performance
On the Massive Text Embedding Benchmark (MTEB, 56 datasets), Udever-bloom-3b achieves an average score of 59.86. Key sub-task scores include classification 71.91, clustering 40.74, pair classification 84.06, reranking 54.90, retrieval 47.67, STS 82.37, and summarization 30.62.
On CodeSearchNet (code search across six languages), the model attains an average MRR of 82.29, with per-language scores of Go 80.63, Ruby 75.40, Python 98.02, Java 83.88, JavaScript 76.18, and PHP 79.67.
In Chinese multi-domain retrieval (Multi-cpr), the model achieves MRR@10 of 0.267 (E-commerce), 0.228 (Entertainment video), and 0.288 (Medical), with Recall@1k of 0.871, 0.836, and 0.619 respectively.
The model is hosted as a managed API on gigarouter, providing OpenAI-compatible endpoint access. The underlying research, Language Models are Universal Embedders, was accepted to the XLLM Workshop at ACL 2025.
best for
- ·Semantic search and retrieval across multiple languages
- ·Code search and retrieval
- ·Text classification and clustering
FAQ
It excels at embedding multilingual text and code for tasks like retrieval, classification, and clustering across natural and programming languages.
3 billion parameters.
Yes, it supports multiple natural languages and programming languages, trained on BLOOM's multilingual corpus.
Use the OpenAI-compatible endpoint with your API key. The service handles tokenization and special tokens automatically.
The model expects a query or document with special prefix tokens ([BOQ] for query, [BOD] for document) and suffix tokens ([EOQ], [EOD]). The gigarouter API manages these automatically.
We're benchmarking and onboarding Udever BLOOM 3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.