Granite Embedding Small English R2
ibm-granite/granite-embedding-small-english-r2
published Jul 2025 · updated Jan 2026
Granite Embedding Small English R2 is an embed model that generates high-quality text embeddings for retrieval, search, and similarity tasks.
specs
| Task | Text embedding for retrieval and similarity |
| Architecture | ModernBERT bi-encoder |
| Parameters | 47M |
| License | Apache 2.0 |
| Embedding Size | 384 |
| Context Length | 8,192 tokens |
about this model
granite-embedding-small-english-r2 is a 47M parameter dense bi-encoder embedding model that generates 384-dimensional text embeddings for queries and documents, supporting up to 8,192 tokens of context.
Architecture and Training
Built on the ModernBERT architecture, the model incorporates Rotary Position Embeddings (RoPE), GeGLU activations, alternating attention (global every 3rd layer), and Flash Attention 2.0 for efficient inference. It was trained using retrieval-oriented pretraining, contrastive finetuning, and knowledge distillation exclusively on permissively licensed open-source data and IBM-collected or generated datasets. Notably, MS-MARCO (non-commercial license) was not used.
Performance
The model delivers strong results across diverse retrieval benchmarks, as shown below. It achieves encoding speeds of 199 documents per second on a single H100 GPU (512‑token sliding window chunks), and the R2 paper reports 19–44% speed advantages over leading competitors while maintaining accuracy.
| Benchmark | Score |
|---|---|
| BEIR Retrieval (15 datasets) | 50.9 |
| MTEB-v2 (41 tasks) | 61.1 |
| CoIR Code Retrieval (10) | 53.8 |
| MLDR Long-Document (English) | 39.8 |
| MTRAG Conversational (4) | 48.1 |
Compared to the previous generation (granite-embedding-30m-english), the R2 model improves BEIR Retrieval by +1.8, CoIR by +6.8, and MLDR by +7.2 points.
Key Strengths
- Enterprise-friendly licensing (Apache 2.0) and transparent data provenance — all training data underwent governance review.
- Optimized for text similarity, retrieval, and search across general, code, long-document, conversational, and table retrieval tasks.
- Lightweight 47M parameters with 384‑dim embeddings and 8,192‑token context, balancing speed and accuracy.
best for
- ·Enterprise dense retrieval and semantic search
- ·Code retrieval (COIR benchmarks)
- ·Long-document search and retrieval
- ·Conversational multi-turn retrieval (MTRAG)
FAQ
It is best for generating dense embeddings for text retrieval, including enterprise search, code search, long-document search, and conversational multi-turn retrieval.
At 47M parameters with an embedding size of 384, it processes about 199 documents per second (on an H100 GPU) and offers a 19-44% speed advantage over leading competitors while maintaining competitive accuracy.
It is released under the Apache 2.0 license, permitting both research and commercial use.
Input is text (queries or passages) up to 8,192 tokens; output is a 384-dimensional dense vector. Cosine similarity is used for comparison.
Use the gigarouter OpenAI-compatible endpoint with your API key. The model is hosted as a service; refer to gigarouter documentation for the exact endpoint and request format.
We're benchmarking and onboarding Granite Embedding Small English R2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.