Granite Embedding Small English R2

ibm-granite/granite-embedding-small-english-r2

published Jul 2025 · updated Jan 2026

Granite Embedding Small English R2 is an embed model that generates high-quality text embeddings for retrieval, search, and similarity tasks.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

2.2M

license

apache-2.0

specs

Task	Text embedding for retrieval and similarity
Architecture	ModernBERT bi-encoder
Parameters	47M
License	Apache 2.0
Embedding Size	384
Context Length	8,192 tokens

about this model

granite-embedding-small-english-r2 is a 47M parameter dense bi-encoder embedding model that generates 384-dimensional text embeddings for queries and documents, supporting up to 8,192 tokens of context.

Architecture and Training

Built on the ModernBERT architecture, the model incorporates Rotary Position Embeddings (RoPE), GeGLU activations, alternating attention (global every 3rd layer), and Flash Attention 2.0 for efficient inference. It was trained using retrieval-oriented pretraining, contrastive finetuning, and knowledge distillation exclusively on permissively licensed open-source data and IBM-collected or generated datasets. Notably, MS-MARCO (non-commercial license) was not used.

Performance

The model delivers strong results across diverse retrieval benchmarks, as shown below. It achieves encoding speeds of 199 documents per second on a single H100 GPU (512‑token sliding window chunks), and the R2 paper reports 19–44% speed advantages over leading competitors while maintaining accuracy.

Benchmark	Score
BEIR Retrieval (15 datasets)	50.9
MTEB-v2 (41 tasks)	61.1
CoIR Code Retrieval (10)	53.8
MLDR Long-Document (English)	39.8
MTRAG Conversational (4)	48.1

Compared to the previous generation (granite-embedding-30m-english), the R2 model improves BEIR Retrieval by +1.8, CoIR by +6.8, and MLDR by +7.2 points.

Key Strengths

Enterprise-friendly licensing (Apache 2.0) and transparent data provenance — all training data underwent governance review.
Optimized for text similarity, retrieval, and search across general, code, long-document, conversational, and table retrieval tasks.
Lightweight 47M parameters with 384‑dim embeddings and 8,192‑token context, balancing speed and accuracy.

best for

·Enterprise dense retrieval and semantic search
·Code retrieval (COIR benchmarks)
·Long-document search and retrieval
·Conversational multi-turn retrieval (MTRAG)

FAQ

What is this model best for?

It is best for generating dense embeddings for text retrieval, including enterprise search, code search, long-document search, and conversational multi-turn retrieval.

How does it compare in size and speed to other models?

At 47M parameters with an embedding size of 384, it processes about 199 documents per second (on an H100 GPU) and offers a 19-44% speed advantage over leading competitors while maintaining competitive accuracy.

What is the license for this model?

It is released under the Apache 2.0 license, permitting both research and commercial use.

What are the input and output formats?

Input is text (queries or passages) up to 8,192 tokens; output is a 384-dimensional dense vector. Cosine similarity is used for comparison.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key. The model is hosted as a service; refer to gigarouter documentation for the exact endpoint and request format.

not yet live

We're benchmarking and onboarding Granite Embedding Small English R2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5