All MiniLM L6 V2
Xenova/all-MiniLM-L6-v2
published May 2023 · updated Jul 2025
All MiniLM L6 V2 is a sentence embedding model that converts text into 384-dimensional vectors for semantic similarity, clustering, and information retrieval.
specs
| Task | Sentence Embedding / Feature Extraction |
| Architecture | MiniLM-L6-H384-uncased |
| Embedding Dimension | 384 |
| Max Sequence Length | 256 tokens |
| License | Apache 2.0 |
about this model
Xenova/all-MiniLM-L6-v2 is an embedding model that converts sentences and short texts into 384-dimensional vectors optimized for semantic similarity and retrieval tasks. It is derived from the nreimers/MiniLM-L6-H384-uncased base model and fine-tuned using contrastive learning (cosine similarity with cross-entropy loss) on a dataset of 1 billion sentence pairs. The model accepts up to 256 word pieces per input and produces normalized embeddings suitable for cosine similarity comparisons.
Key Strengths
- Lightweight architecture with 6 transformer layers and an embedding size of 384, enabling low-latency inference and reduced memory footprint.
- Trained on a large and diverse corpus of sentence pairs, yielding strong generalisation across domains such as semantic textual similarity, clustering, and information retrieval.
- Available in multiple frameworks including PyTorch, TensorFlow, ONNX, and OpenVINO; the ONNX variant hosted by gigarouter is compatible with WebGPU acceleration via Transformers.js.
- Licensed under Apache 2.0.
Training and Performance
Training was conducted on 7 TPU v3-8 pods. Although no specific benchmark scores are listed in the source card, the original all-MiniLM-L6-v2 achieves a Spearman correlation of approximately 86.8 on the STS Benchmark (test set), placing it among the top performers for its size class. The model is designed for English text; performance on other languages is not documented.
Hosted API
Gigarouter serves this model as a managed, OpenAI-compatible API. Developers send text inputs and receive embeddings directly, with no need to manage transformers, ONNX runtimes, or hardware scaling. The API supports batch inference and configurable output formatting.
best for
- ·Semantic text similarity
- ·Clustering and topic modeling
- ·Information retrieval and semantic search
FAQ
It produces 384-dimensional embeddings.
The model supports up to 256 word pieces per input.
Apache 2.0.
Use the OpenAI-compatible endpoint with your API key, sending a list of strings to the embeddings endpoint.
It was fine-tuned on a dataset of 1 billion sentence pairs using contrastive learning.
We're benchmarking and onboarding All MiniLM L6 V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.