skip to content
gigarouter gigarouter
models / embeddings · coming soon

Neural Sparse Encoding Doc V3 GTE

opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte

published Jun 2025 · updated Jul 2025

Neural Sparse Encoding Doc V3 GTE is a learned sparse retrieval model that encodes documents into 30522-dimensional sparse vectors for inference-free, efficient search.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
2.1K
license
apache-2.0

specs

TaskLearned Sparse Retrieval
ArchitectureTransformer-based SparseEncoder (30522-dim sparse vectors)
Parameters133M
LicenseApache 2.0

about this model

opensearch-neural-sparse-encoding-doc-v3-gte is a learned sparse retrieval model that encodes documents into 30,522-dimensional sparse vectors, where each non-zero dimension corresponds to a token in the vocabulary and its weight indicates token importance. For queries, the model uses a tokenizer and a weight look-up table to generate sparse vectors without neural network inference, enabling retrieval via inner product similarity between query and document sparse vectors.

This model is part of the v3 series, which delivers better search relevance, efficiency, and inference speed than prior v1 and v2 series. It is an inference-free sparse retriever: documents are encoded during indexing, and queries require no model inference, only a tokenizer and weight look-up. This design eliminates online neural network inference during retrieval, reducing computational cost while maintaining reasonable throughput and latency. Client-side latency is only 1.1x that of BM25.

The model achieves state-of-the-art performance among inference-free sparse retrieval models on the BEIR benchmark, with an average NDCG@10 of 0.546 and average FLOPS of 1.7. It outperforms the previous state-of-the-art inference-free sparse model by 3.3 NDCG@10. The associated paper has been accepted at SIGIR 2025.

Search Relevance on BEIR (NDCG@10)

DatasetScore
Average0.546
Trec Covid0.734
NFCorpus0.360
NQ0.582
HotpotQA0.716
FiQA0.407
ArguAna0.520
Touche0.389
DBPedia0.455
SCIDOCS0.167
FEVER0.860
Climate FEVER0.312
SciFact0.725
Quora0.873

The model has 133M parameters and was trained on a diverse set of datasets including MS MARCO, eli5_question_answer, squad_pairs, WikiAnswers, yahoo_answers_title_question, gooaq_pairs, stackexchange_duplicate_questions_body_body, wikihow, S2ORC_title_abstract, stackexchange_duplicate_questions_title-body_title-body, yahoo_answers_question_answer, searchQA_top5_snippets, stackexchange_duplicate_questions_title_title, yahoo_answers_title_answer, fever, fiqa, hotpotqa, nfcorpus, and scifact. The released model was trained with additional data beyond the zero-shot experiment reported in the paper, for production purposes.

best for

FAQ

What is this model best used for?

It is best for inference-free learned sparse retrieval, where documents are encoded offline and queries use a tokenizer and IDF lookup, achieving high search relevance with low latency.

How does this model compare in size and speed to other sparse retrievers?

It has 133M parameters, achieves an average NDCG@10 of 0.546 on BEIR, and has client-side latency only 1.1x that of BM25.

What is the license for this model?

It is licensed under Apache 2.0.

What is the input and output format for this model?

Input is text (query or document). Output is a 30522-dimensional sparse vector where non-zero dimensions represent token IDs and their importance weights; similarity is computed via inner product.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending the text input and receiving the sparse embedding in the response.

not yet live

We're benchmarking and onboarding Neural Sparse Encoding Doc V3 GTE as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →