skip to content
gigarouter gigarouter
models / embeddings · coming soon

BGE EN ICL

BAAI/bge-en-icl

published Jul 2024 · updated Jan 2025

BGE EN ICL is an embedding model that uses in-context learning with few-shot examples to produce high-quality text embeddings.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
1K
license
apache-2.0

specs

TaskText Embedding
ArchitectureDecoder-only LLM with in-context learning
LicenseMIT

about this model

BAAI/bge-en-icl is an embedding model that leverages in-context learning (ICL) to produce high-quality text embeddings. By incorporating few-shot examples directly into the query input, the model adapts to new tasks without fine-tuning, generating embeddings that reflect the task structure defined by the provided examples.

Key Capabilities

The model integrates task-related examples into the query side, enabling it to handle both familiar and novel tasks through in-context learning. It retains the original decoder-only LLM framework, using last-token pooling, and supports both zero-shot and few-shot modes.

Benchmark Performance

BGE-EN-ICL achieves state-of-the-art results on the MTEB and AIR-Bench leaderboards.

BEIR (MTEB leaderboard):

BEIR benchmark results chart BEIR benchmark results chart

AIR-Bench 24.04 — QA (nDCG@10):

ModelwikiwebnewshealthcarelawfinancearxivmsmarcoALL (8)
bge-en-icl zero-shot64.6154.4055.1157.2525.1054.8148.4663.7152.93
bge-en-icl few-shot64.9455.1156.0258.8528.2957.1650.0464.5054.36

Long-Doc (en, Recall@10):

Modelarxiv (4)book (2)healthcare (5)law (4)ALL (15)
text-embedding-3-large74.5373.1665.8364.4768.77
e5-mistral-7b-instruct72.1472.40

The model is released under the MIT license. The full training dataset (bge-full-data) contains over 2.1 million rows across 34 configs/splits, including sources such as arXiv, biorxiv, and newsgroups.

best for

FAQ

What is the main innovation of BGE EN ICL?

It uses in-context learning with few-shot examples in the query to produce high-quality text embeddings, achieving SOTA on MTEB and AIR-Bench.

What is the license for BGE EN ICL?

It is released under the MIT license, free for both academic and commercial use.

How do I call BGE EN ICL via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending queries and documents as input to get embeddings.

What input format does BGE EN ICL expect?

Queries should include a task instruction and optional few-shot examples, formatted with <instruct> and <query> tags. Documents are plain text.

How does BGE EN ICL compare to other embedding models in size?

The model card does not specify parameter count, but it is based on a decoder-only LLM and achieves SOTA on MTEB and AIR-Bench.

not yet live

We're benchmarking and onboarding BGE EN ICL as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →