INF-Retriever V1

infly/inf-retriever-v1

published Dec 2024 · updated Feb 2026

INF-Retriever V1 is a dense retrieval model optimized for Chinese and English retrieval, built upon GTE-Qwen2-7B-Instruct and achieving top performance on the AIR-Bench leaderboard.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

359

license

apache-2.0

specs

Task	Embeddings (Dense Retrieval)
Architecture	Transformer (Qwen2-based)
Parameters	7B
Embedding Dimension	3584
Max Input Tokens	32768

about this model

INF-Retriever-v1 is an LLM-based dense retrieval model developed by INF TECH, built upon the gte-Qwen2-7B-instruct architecture and fine-tuned to excel in retrieval tasks, particularly for Chinese and English data. As of January 23, 2025, it ranks No.1 on both versions (24.04 and 24.05) of the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench), demonstrating cutting-edge performance in heterogeneous information retrieval across multiple domains and languages.

Key Strengths

Optimized for Chinese and English retrieval with retrieval-focused fine-tuning, achieving high accuracy and efficiency in a variety of retrieval scenarios.
Top-tier performance on AIR-Bench, outperforming models such as GTE-Qwen2-7B-instruct, BGE-Multilingual-Gemma2, BGE-M3, and E5-mistral-7b-instruct on the bilingual benchmark.
Despite being fine-tuned exclusively on English and Chinese, the model also achieves strong results on the multilingual AIR-Bench 24.05 (13 languages), securing the No.1 position overall.

Benchmark Performance

The following table shows results on AIR-Bench 24.04 (bilingual English & Chinese), with averages across 14 diverse domains:

Model	Average⬆️	wiki_en	wiki_zh	web_en	web_zh	healthcare_en	healthcare_zh	law_en	arxiv_en	news_en	news_zh	finance_en	finance_zh	msmarco_en
INF-Retriever-v1	52.56	65.25	68.44	52.13	56.6	56.96	42.03	34.51	50.62	53.32	50.02	58.34	35.42	59.64
GTE-Qwen2-7B-instruct	48.38	63.46	66.44	51.2	51.98	54.2	38.82	22.31	40.27	54.07	43.03	58.2	26.63	58.39
BGE-Multilingual-Gemma2	46.83	63.71	67.3	50.38	53.24	47.24	42.13	22.58	23.28	50.91	44.02	49.3	31.6	63.14
BGE-M3	46.65	60.49	62.36	47.35	50.38	49.1	42.38	26.68	40.76	48.04	40.75	51.52	32.18	54.4
E5-mistral-7b-instruct	45.26	61.67	55.97	44.41	45.96	56.32	35.79	19.32	44.78	48.18	35.99	54.79	26.11	59.03

On AIR-Bench 24.05 (multilingual, 13 languages), INF-Retriever-v1 achieves an average score of 54.47, surpassing all competitors and demonstrating robust zero-shot transfer beyond its training languages.

best for

·Chinese-English cross-lingual document search
·Enterprise knowledge base retrieval
·Question-answering over heterogeneous data sources

FAQ

What is the embedding dimension of INF-Retriever V1?

The embedding dimension is 3584.

What is the maximum input token length?

The model supports up to 32768 tokens per input.

How do I use INF-Retriever V1 via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; send requests with the model name "infly/inf-retriever-v1" and input text.

Does INF-Retriever V1 require instruction prefixes for queries?

Yes, for best results, prepend each query with "Instruct: [task description]\nQuery: [query]" as shown in the model card.

not yet live

We're benchmarking and onboarding INF-Retriever V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5