Neural Sparse Encoding Doc V3 GTE

opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte

published Jun 2025 · updated Jul 2025

Neural Sparse Encoding Doc V3 GTE is a learned sparse retrieval model that encodes documents into 30522-dimensional sparse vectors for inference-free, efficient search.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

2.1K

license

apache-2.0

specs

Task	Learned Sparse Retrieval
Architecture	Transformer-based SparseEncoder (30522-dim sparse vectors)
Parameters	133M
License	Apache 2.0

about this model

opensearch-neural-sparse-encoding-doc-v3-gte is a learned sparse retrieval model that encodes documents into 30,522-dimensional sparse vectors, where each non-zero dimension corresponds to a token in the vocabulary and its weight indicates token importance. For queries, the model uses a tokenizer and a weight look-up table to generate sparse vectors without neural network inference, enabling retrieval via inner product similarity between query and document sparse vectors.

This model is part of the v3 series, which delivers better search relevance, efficiency, and inference speed than prior v1 and v2 series. It is an inference-free sparse retriever: documents are encoded during indexing, and queries require no model inference, only a tokenizer and weight look-up. This design eliminates online neural network inference during retrieval, reducing computational cost while maintaining reasonable throughput and latency. Client-side latency is only 1.1x that of BM25.

The model achieves state-of-the-art performance among inference-free sparse retrieval models on the BEIR benchmark, with an average NDCG@10 of 0.546 and average FLOPS of 1.7. It outperforms the previous state-of-the-art inference-free sparse model by 3.3 NDCG@10. The associated paper has been accepted at SIGIR 2025.

Search Relevance on BEIR (NDCG@10)

Dataset	Score
Average	0.546
Trec Covid	0.734
NFCorpus	0.360
NQ	0.582
HotpotQA	0.716
FiQA	0.407
ArguAna	0.520
Touche	0.389
DBPedia	0.455
SCIDOCS	0.167
FEVER	0.860
Climate FEVER	0.312
SciFact	0.725
Quora	0.873

The model has 133M parameters and was trained on a diverse set of datasets including MS MARCO, eli5_question_answer, squad_pairs, WikiAnswers, yahoo_answers_title_question, gooaq_pairs, stackexchange_duplicate_questions_body_body, wikihow, S2ORC_title_abstract, stackexchange_duplicate_questions_title-body_title-body, yahoo_answers_question_answer, searchQA_top5_snippets, stackexchange_duplicate_questions_title_title, yahoo_answers_title_answer, fever, fiqa, hotpotqa, nfcorpus, and scifact. The released model was trained with additional data beyond the zero-shot experiment reported in the paper, for production purposes.

best for

·Inference-free document retrieval with high search relevance
·Efficient real-time search with latency only 1.1x BM25
·Zero-shot retrieval on BEIR benchmark datasets

FAQ

What is this model best used for?

It is best for inference-free learned sparse retrieval, where documents are encoded offline and queries use a tokenizer and IDF lookup, achieving high search relevance with low latency.

How does this model compare in size and speed to other sparse retrievers?

It has 133M parameters, achieves an average NDCG@10 of 0.546 on BEIR, and has client-side latency only 1.1x that of BM25.

What is the license for this model?

It is licensed under Apache 2.0.

What is the input and output format for this model?

Input is text (query or document). Output is a 30522-dimensional sparse vector where non-zero dimensions represent token IDs and their importance weights; similarity is computed via inner product.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending the text input and receiving the sparse embedding in the response.

not yet live

We're benchmarking and onboarding Neural Sparse Encoding Doc V3 GTE as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5