Jina Embedding S v1
jinaai/jina-embedding-s-en-v1
published Jul 2023 · updated Jan 2025
Jina Embedding S v1 is a text embedding model that converts sentences into 512-dimensional vectors for semantic search, information retrieval, and similarity tasks.
specs
| Task | Embedding (Sentence Similarity) |
| Parameters | 35 million |
| Dimension | 512 |
| License | Apache-2.0 |
about this model
jina-embedding-s-en-v1 is a text embedding model that maps text to 512-dimensional vectors, trained by Jina AI on the Linnaeus-Clean dataset of 380 million sentence pairs drawn from diverse domains. With 35 million parameters, it balances inference speed and semantic quality for tasks such as information retrieval, semantic textual similarity, and reranking.
Benchmark performance
The model is evaluated on standard STS and retrieval benchmarks. Compared to all-minilm-l6-v2, all-mpnet-base-v2, and OpenAI’s ada-embedding-002:
| Model | Parameters | Dimension |
|---|---|---|
| all-minilm-l6-v2 | 23M | 384 |
| all-mpnet-base-v2 | 110M | 768 |
| ada-embedding-002 | Unknown (API) | 1536 |
| jina-embedding-s-en-v1 | 35M | 512 |
| Benchmark | all-minilm-l6-v2 | all-mpnet-base-v2 | ada-002 | jina-s-en-v1 |
|---|---|---|---|---|
| STS12 | 0.724 | 0.726 | 0.698 | 0.743 |
| STS13 | 0.806 | 0.835 | 0.833 | 0.786 |
| STS14 | 0.756 | 0.78 | 0.761 | 0.738 |
| STS15 | 0.854 | 0.857 | 0.861 | 0.837 |
| STS16 | 0.79 | 0.80 | 0.86 | 0.80 |
| STS17 | 0.876 | 0.906 | 0.903 | 0.875 |
| TRECOVID | 0.473 | 0.513 | 0.685 | 0.523 |
| Quora | 0.876 | 0.875 | 0.876 | 0.857 |
| SciFact | 0.645 | 0.656 | 0.726 | 0.524 |
Additional evaluation
On the full MTEB benchmark (additional source), the model achieves, for example, ArguAna Retrieval NDCG@10 of 43.57, BIOSSES STS Cosine Spearman of 82.96, and Banking77 Classification accuracy of 74.64. The training also incorporates the jinaai/negation-dataset to improve handling of negated statements. Licensed under Apache-2.0.
best for
- ·Information retrieval (dense retrieval)
- ·Semantic textual similarity
- ·Text reranking
FAQ
It is designed for semantic search, information retrieval, and semantic textual similarity, converting text into dense 512-dimensional embeddings.
It is the 35M parameter variant in the Jina Embeddings v1 family, offering a balance between speed and accuracy. Smaller: t (14M, 312-dim). Larger: b (110M, 768-dim) and l (330M, 1024-dim).
Use the OpenAI-compatible endpoint with your API key. Send a POST request with the input text and model name to the gigarouter embeddings endpoint.
Apache-2.0, allowing free use, modification, and distribution.
We're benchmarking and onboarding Jina Embedding S v1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.