skip to content
gigarouter gigarouter
models / embeddings · coming soon

Jina Embedding B English V1

jinaai/jina-embedding-b-en-v1

published Jul 2023 · updated Jan 2025

Jina Embedding B English V1 is a 110-million-parameter text embedding model that converts textual inputs into numerical representations for tasks like information retrieval and semantic textual similarity.

status
coming soon
API providers
0
downloads / mo
3.5K
license
apache-2.0

specs

TaskText Embedding
ArchitectureTransformer (BERT-style)
Parameters110 million
LicenseApache 2.0

about this model

jina-embedding-b-en-v1 is a text embedding model that converts textual inputs into dense vector representations, capturing semantic meaning for tasks such as information retrieval and semantic textual similarity. With 110 million parameters and a 768-dimensional output, it offers a balance of inference speed and accuracy. The model was trained on Jina AI’s Linnaeus-Clean dataset, which comprises 380 million curated query-document pairs. As described in the accompanying paper, Jina Embeddings excel in dense retrieval and semantic similarity, and were evaluated using the Massive Text Embedding Benchmark (MTEB).

Benchmark Performance

The table below compares jina-embedding-b-en-v1 against several popular embedding models across standard evaluation tasks (Spearman correlation, except TRECOVID and SciFact which use nDCG and accuracy respectively).
Model STS12 STS13 STS14 STS15 STS16 STS17 TRECOVID Quora SciFact
all-minilm-l6-v2 0.724 0.806 0.756 0.854 0.790 0.876 0.473 0.876 0.645
all-mpnet-base-v2 0.726 0.835 0.780 0.857 0.800 0.906 0.513 0.875 0.656
ada-embedding-002 0.698 0.833 0.761 0.861 0.860 0.903 0.685 0.876 0.726
jina-embedding-b-en-v1 0.751 0.809 0.761 0.856 0.812 0.890 0.606 0.876 0.594

Jina AI text embedding models set logo

The model achieves top performance on STS12 among the compared models and delivers competitive results across other semantic similarity and retrieval tasks. For further details on training data, methodology, and a negation-aware dataset, refer to the Jina Embeddings paper (arXiv:2307.11224).

best for

FAQ

What is the output dimension of this model?

The output dimension is 768.

How does this model compare to all-mpnet-base-v2?

Both have 110M parameters and 768 output dimensions; Jina Embedding B English V1 achieves higher scores on STS12 and TRECOVID benchmarks.

What is the license for this model?

The model is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send text inputs and receive embeddings.

What is the recommended hardware for inference?

A single GPU is recommended for fast inference.

not yet live

We're benchmarking and onboarding Jina Embedding B English V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →