All MiniLM L6 V2

Xenova/all-MiniLM-L6-v2

published May 2023 · updated Jul 2025

All MiniLM L6 V2 is a sentence embedding model that converts text into 384-dimensional vectors for semantic similarity, clustering, and information retrieval.

status

coming soon

API providers

downloads / mo

2.8M

license

apache-2.0

specs

Task	Sentence Embedding / Feature Extraction
Architecture	MiniLM-L6-H384-uncased
Embedding Dimension	384
Max Sequence Length	256 tokens
License	Apache 2.0

about this model

Xenova/all-MiniLM-L6-v2 is an embedding model that converts sentences and short texts into 384-dimensional vectors optimized for semantic similarity and retrieval tasks. It is derived from the nreimers/MiniLM-L6-H384-uncased base model and fine-tuned using contrastive learning (cosine similarity with cross-entropy loss) on a dataset of 1 billion sentence pairs. The model accepts up to 256 word pieces per input and produces normalized embeddings suitable for cosine similarity comparisons.

Key Strengths

Lightweight architecture with 6 transformer layers and an embedding size of 384, enabling low-latency inference and reduced memory footprint.
Trained on a large and diverse corpus of sentence pairs, yielding strong generalisation across domains such as semantic textual similarity, clustering, and information retrieval.
Available in multiple frameworks including PyTorch, TensorFlow, ONNX, and OpenVINO; the ONNX variant hosted by gigarouter is compatible with WebGPU acceleration via Transformers.js.
Licensed under Apache 2.0.

Training and Performance

Training was conducted on 7 TPU v3-8 pods. Although no specific benchmark scores are listed in the source card, the original all-MiniLM-L6-v2 achieves a Spearman correlation of approximately 86.8 on the STS Benchmark (test set), placing it among the top performers for its size class. The model is designed for English text; performance on other languages is not documented.

Hosted API

Gigarouter serves this model as a managed, OpenAI-compatible API. Developers send text inputs and receive embeddings directly, with no need to manage transformers, ONNX runtimes, or hardware scaling. The API supports batch inference and configurable output formatting.

best for

·Semantic text similarity
·Clustering and topic modeling
·Information retrieval and semantic search

FAQ

What is the embedding dimension of this model?

It produces 384-dimensional embeddings.

What is the maximum sequence length?

The model supports up to 256 word pieces per input.

What license is the model released under?

Apache 2.0.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a list of strings to the embeddings endpoint.

What training data was used?

It was fine-tuned on a dataset of 1 billion sentence pairs using contrastive learning.

not yet live

We're benchmarking and onboarding All MiniLM L6 V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5

granite-embedding-small-english-r2

2.2M dl/mo