Jina Embedding B English V1

jinaai/jina-embedding-b-en-v1

published Jul 2023 · updated Jan 2025

Jina Embedding B English V1 is a 110-million-parameter text embedding model that converts textual inputs into numerical representations for tasks like information retrieval and semantic textual similarity.

status

coming soon

API providers

downloads / mo

3.5K

license

apache-2.0

specs

Task	Text Embedding
Architecture	Transformer (BERT-style)
Parameters	110 million
License	Apache 2.0

about this model

jina-embedding-b-en-v1 is a text embedding model that converts textual inputs into dense vector representations, capturing semantic meaning for tasks such as information retrieval and semantic textual similarity. With 110 million parameters and a 768-dimensional output, it offers a balance of inference speed and accuracy. The model was trained on Jina AI’s Linnaeus-Clean dataset, which comprises 380 million curated query-document pairs. As described in the accompanying paper, Jina Embeddings excel in dense retrieval and semantic similarity, and were evaluated using the Massive Text Embedding Benchmark (MTEB).

Benchmark Performance

The table below compares jina-embedding-b-en-v1 against several popular embedding models across standard evaluation tasks (Spearman correlation, except TRECOVID and SciFact which use nDCG and accuracy respectively).

Model	STS12	STS13	STS14	STS15	STS16	STS17	TRECOVID	Quora	SciFact
all-minilm-l6-v2	0.724	0.806	0.756	0.854	0.790	0.876	0.473	0.876	0.645
all-mpnet-base-v2	0.726	0.835	0.780	0.857	0.800	0.906	0.513	0.875	0.656
ada-embedding-002	0.698	0.833	0.761	0.861	0.860	0.903	0.685	0.876	0.726
jina-embedding-b-en-v1	0.751	0.809	0.761	0.856	0.812	0.890	0.606	0.876	0.594

Jina AI text embedding models set logo

The model achieves top performance on STS12 among the compared models and delivers competitive results across other semantic similarity and retrieval tasks. For further details on training data, methodology, and a negation-aware dataset, refer to the Jina Embeddings paper (arXiv:2307.11224).

best for

·Information retrieval and dense passage retrieval
·Semantic textual similarity
·Text reranking

FAQ

What is the output dimension of this model?

The output dimension is 768.

How does this model compare to all-mpnet-base-v2?

Both have 110M parameters and 768 output dimensions; Jina Embedding B English V1 achieves higher scores on STS12 and TRECOVID benchmarks.

What is the license for this model?

The model is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send text inputs and receive embeddings.

What is the recommended hardware for inference?

A single GPU is recommended for fast inference.

not yet live

We're benchmarking and onboarding Jina Embedding B English V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5