All MiniLM L6 v2

Qdrant/all-MiniLM-L6-v2-onnx

published Jan 2024 · updated Jun 2025

All MiniLM L6 v2 is a embed model that generates 384-dimensional embeddings for text classification and similarity searches.

status

coming soon

API providers

downloads / mo

1.3M

license

apache-2.0

specs

Task	Embeddings (text classification, similarity)
Architecture	MiniLM (L6, H384)
Embedding dimension	384
Max sequence length	256 tokens
Model size	0.090 GB
License	Apache 2.0

about this model

Qdrant/all-MiniLM-L6-v2-onnx is an embedding model that maps sentences and paragraphs to a 384-dimensional dense vector space, optimized for text classification and similarity search.

Model Details

This model is an ONNX port of sentence-transformers/all-MiniLM-L6-v2. It was fine-tuned from nreimers/MiniLM-L6-H384-uncased using contrastive learning on 1 billion sentence pairs, developed during Hugging Face Community Week with JAX/Flax on 7 TPUs v3-8. The model accepts up to 256 tokens per input and produces 384-dimensional embeddings. Its on-disk size is 90 MB, and it is released under the Apache 2.0 license.

Key Strengths

Lightweight ONNX runtime – no GPU required, low memory footprint (90 MB).
Fast inference suitable for serverless and production environments, as demonstrated in the FastEmbed library.
Data parallelism support for encoding large datasets.
Proven general-purpose embedding quality via the underlying sentence-transformers model, which has been widely adopted for retrieval, clustering, and semantic similarity tasks.

Usage via gigarouter

As a hosted API, gigarouter provides an OpenAI-compatible endpoint for this model. No local installation, GPU, or library dependencies are needed – simply send text and receive embeddings in response.

best for

·Text similarity and semantic search
·Text classification and clustering
·Document retrieval and RAG pipelines

FAQ

What is this model best for?

It is optimized for text classification, similarity searches, and semantic retrieval tasks.

What is the embedding dimension and max sequence length?

384-dimensional embeddings; max input length is 256 tokens.

How does it compare in size and speed to other embed models?

It is lightweight (90 MB on disk) and faster than PyTorch-based models due to ONNX Runtime.

How can I use this model via the gigarouter API?

Send requests to the OpenAI-compatible endpoint with your API key and model name "all-MiniLM-L6-v2-onnx".

What license is this model released under?

Apache 2.0.

not yet live

We're benchmarking and onboarding All MiniLM L6 v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5