All MiniLM L6 v2
Qdrant/all-MiniLM-L6-v2-onnx
published Jan 2024 · updated Jun 2025
All MiniLM L6 v2 is a embed model that generates 384-dimensional embeddings for text classification and similarity searches.
specs
| Task | Embeddings (text classification, similarity) |
| Architecture | MiniLM (L6, H384) |
| Embedding dimension | 384 |
| Max sequence length | 256 tokens |
| Model size | 0.090 GB |
| License | Apache 2.0 |
about this model
Qdrant/all-MiniLM-L6-v2-onnx is an embedding model that maps sentences and paragraphs to a 384-dimensional dense vector space, optimized for text classification and similarity search.
Model Details
This model is an ONNX port of sentence-transformers/all-MiniLM-L6-v2. It was fine-tuned from nreimers/MiniLM-L6-H384-uncased using contrastive learning on 1 billion sentence pairs, developed during Hugging Face Community Week with JAX/Flax on 7 TPUs v3-8. The model accepts up to 256 tokens per input and produces 384-dimensional embeddings. Its on-disk size is 90 MB, and it is released under the Apache 2.0 license.
Key Strengths
- Lightweight ONNX runtime – no GPU required, low memory footprint (90 MB).
- Fast inference suitable for serverless and production environments, as demonstrated in the FastEmbed library.
- Data parallelism support for encoding large datasets.
- Proven general-purpose embedding quality via the underlying sentence-transformers model, which has been widely adopted for retrieval, clustering, and semantic similarity tasks.
Usage via gigarouter
As a hosted API, gigarouter provides an OpenAI-compatible endpoint for this model. No local installation, GPU, or library dependencies are needed – simply send text and receive embeddings in response.
best for
- ·Text similarity and semantic search
- ·Text classification and clustering
- ·Document retrieval and RAG pipelines
FAQ
It is optimized for text classification, similarity searches, and semantic retrieval tasks.
384-dimensional embeddings; max input length is 256 tokens.
It is lightweight (90 MB on disk) and faster than PyTorch-based models due to ONNX Runtime.
Send requests to the OpenAI-compatible endpoint with your API key and model name "all-MiniLM-L6-v2-onnx".
Apache 2.0.
We're benchmarking and onboarding All MiniLM L6 v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.