Neural Sparse Encoding Doc V3 GTE
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
published Jun 2025 · updated Jul 2025
Neural Sparse Encoding Doc V3 GTE is a learned sparse retrieval model that encodes documents into 30522-dimensional sparse vectors for inference-free, efficient search.
specs
| Task | Learned Sparse Retrieval |
| Architecture | Transformer-based SparseEncoder (30522-dim sparse vectors) |
| Parameters | 133M |
| License | Apache 2.0 |
about this model
opensearch-neural-sparse-encoding-doc-v3-gte is a learned sparse retrieval model that encodes documents into 30,522-dimensional sparse vectors, where each non-zero dimension corresponds to a token in the vocabulary and its weight indicates token importance. For queries, the model uses a tokenizer and a weight look-up table to generate sparse vectors without neural network inference, enabling retrieval via inner product similarity between query and document sparse vectors.
This model is part of the v3 series, which delivers better search relevance, efficiency, and inference speed than prior v1 and v2 series. It is an inference-free sparse retriever: documents are encoded during indexing, and queries require no model inference, only a tokenizer and weight look-up. This design eliminates online neural network inference during retrieval, reducing computational cost while maintaining reasonable throughput and latency. Client-side latency is only 1.1x that of BM25.
The model achieves state-of-the-art performance among inference-free sparse retrieval models on the BEIR benchmark, with an average NDCG@10 of 0.546 and average FLOPS of 1.7. It outperforms the previous state-of-the-art inference-free sparse model by 3.3 NDCG@10. The associated paper has been accepted at SIGIR 2025.
Search Relevance on BEIR (NDCG@10)
| Dataset | Score |
|---|---|
| Average | 0.546 |
| Trec Covid | 0.734 |
| NFCorpus | 0.360 |
| NQ | 0.582 |
| HotpotQA | 0.716 |
| FiQA | 0.407 |
| ArguAna | 0.520 |
| Touche | 0.389 |
| DBPedia | 0.455 |
| SCIDOCS | 0.167 |
| FEVER | 0.860 |
| Climate FEVER | 0.312 |
| SciFact | 0.725 |
| Quora | 0.873 |
The model has 133M parameters and was trained on a diverse set of datasets including MS MARCO, eli5_question_answer, squad_pairs, WikiAnswers, yahoo_answers_title_question, gooaq_pairs, stackexchange_duplicate_questions_body_body, wikihow, S2ORC_title_abstract, stackexchange_duplicate_questions_title-body_title-body, yahoo_answers_question_answer, searchQA_top5_snippets, stackexchange_duplicate_questions_title_title, yahoo_answers_title_answer, fever, fiqa, hotpotqa, nfcorpus, and scifact. The released model was trained with additional data beyond the zero-shot experiment reported in the paper, for production purposes.
best for
- ·Inference-free document retrieval with high search relevance
- ·Efficient real-time search with latency only 1.1x BM25
- ·Zero-shot retrieval on BEIR benchmark datasets
FAQ
It is best for inference-free learned sparse retrieval, where documents are encoded offline and queries use a tokenizer and IDF lookup, achieving high search relevance with low latency.
It has 133M parameters, achieves an average NDCG@10 of 0.546 on BEIR, and has client-side latency only 1.1x that of BM25.
It is licensed under Apache 2.0.
Input is text (query or document). Output is a 30522-dimensional sparse vector where non-zero dimensions represent token IDs and their importance weights; similarity is computed via inner product.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending the text input and receiving the sparse embedding in the response.
We're benchmarking and onboarding Neural Sparse Encoding Doc V3 GTE as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.