INF-Retriever V1
infly/inf-retriever-v1
published Dec 2024 · updated Feb 2026
INF-Retriever V1 is a dense retrieval model optimized for Chinese and English retrieval, built upon GTE-Qwen2-7B-Instruct and achieving top performance on the AIR-Bench leaderboard.
specs
| Task | Embeddings (Dense Retrieval) |
| Architecture | Transformer (Qwen2-based) |
| Parameters | 7B |
| Embedding Dimension | 3584 |
| Max Input Tokens | 32768 |
about this model
INF-Retriever-v1 is an LLM-based dense retrieval model developed by INF TECH, built upon the gte-Qwen2-7B-instruct architecture and fine-tuned to excel in retrieval tasks, particularly for Chinese and English data. As of January 23, 2025, it ranks No.1 on both versions (24.04 and 24.05) of the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench), demonstrating cutting-edge performance in heterogeneous information retrieval across multiple domains and languages.
Key Strengths
- Optimized for Chinese and English retrieval with retrieval-focused fine-tuning, achieving high accuracy and efficiency in a variety of retrieval scenarios.
- Top-tier performance on AIR-Bench, outperforming models such as GTE-Qwen2-7B-instruct, BGE-Multilingual-Gemma2, BGE-M3, and E5-mistral-7b-instruct on the bilingual benchmark.
- Despite being fine-tuned exclusively on English and Chinese, the model also achieves strong results on the multilingual AIR-Bench 24.05 (13 languages), securing the No.1 position overall.
Benchmark Performance
The following table shows results on AIR-Bench 24.04 (bilingual English & Chinese), with averages across 14 diverse domains:
| Model | Average⬆️ | wiki_en | wiki_zh | web_en | web_zh | healthcare_en | healthcare_zh | law_en | arxiv_en | news_en | news_zh | finance_en | finance_zh | msmarco_en |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| INF-Retriever-v1 | 52.56 | 65.25 | 68.44 | 52.13 | 56.6 | 56.96 | 42.03 | 34.51 | 50.62 | 53.32 | 50.02 | 58.34 | 35.42 | 59.64 |
| GTE-Qwen2-7B-instruct | 48.38 | 63.46 | 66.44 | 51.2 | 51.98 | 54.2 | 38.82 | 22.31 | 40.27 | 54.07 | 43.03 | 58.2 | 26.63 | 58.39 |
| BGE-Multilingual-Gemma2 | 46.83 | 63.71 | 67.3 | 50.38 | 53.24 | 47.24 | 42.13 | 22.58 | 23.28 | 50.91 | 44.02 | 49.3 | 31.6 | 63.14 |
| BGE-M3 | 46.65 | 60.49 | 62.36 | 47.35 | 50.38 | 49.1 | 42.38 | 26.68 | 40.76 | 48.04 | 40.75 | 51.52 | 32.18 | 54.4 |
| E5-mistral-7b-instruct | 45.26 | 61.67 | 55.97 | 44.41 | 45.96 | 56.32 | 35.79 | 19.32 | 44.78 | 48.18 | 35.99 | 54.79 | 26.11 | 59.03 |
On AIR-Bench 24.05 (multilingual, 13 languages), INF-Retriever-v1 achieves an average score of 54.47, surpassing all competitors and demonstrating robust zero-shot transfer beyond its training languages.
best for
- ·Chinese-English cross-lingual document search
- ·Enterprise knowledge base retrieval
- ·Question-answering over heterogeneous data sources
FAQ
The embedding dimension is 3584.
The model supports up to 32768 tokens per input.
Use the OpenAI-compatible endpoint with your API key; send requests with the model name "infly/inf-retriever-v1" and input text.
Yes, for best results, prepend each query with "Instruct: [task description]\nQuery: [query]" as shown in the model card.
We're benchmarking and onboarding INF-Retriever V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.