skip to content
gigarouter gigarouter
models / embeddings · coming soon

INF-Retriever V1

infly/inf-retriever-v1

published Dec 2024 · updated Feb 2026

INF-Retriever V1 is a dense retrieval model optimized for Chinese and English retrieval, built upon GTE-Qwen2-7B-Instruct and achieving top performance on the AIR-Bench leaderboard.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
359
license
apache-2.0

specs

TaskEmbeddings (Dense Retrieval)
ArchitectureTransformer (Qwen2-based)
Parameters7B
Embedding Dimension3584
Max Input Tokens32768

about this model

INF-Retriever-v1 is an LLM-based dense retrieval model developed by INF TECH, built upon the gte-Qwen2-7B-instruct architecture and fine-tuned to excel in retrieval tasks, particularly for Chinese and English data. As of January 23, 2025, it ranks No.1 on both versions (24.04 and 24.05) of the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench), demonstrating cutting-edge performance in heterogeneous information retrieval across multiple domains and languages.

Key Strengths

  • Optimized for Chinese and English retrieval with retrieval-focused fine-tuning, achieving high accuracy and efficiency in a variety of retrieval scenarios.
  • Top-tier performance on AIR-Bench, outperforming models such as GTE-Qwen2-7B-instruct, BGE-Multilingual-Gemma2, BGE-M3, and E5-mistral-7b-instruct on the bilingual benchmark.
  • Despite being fine-tuned exclusively on English and Chinese, the model also achieves strong results on the multilingual AIR-Bench 24.05 (13 languages), securing the No.1 position overall.

Benchmark Performance

The following table shows results on AIR-Bench 24.04 (bilingual English & Chinese), with averages across 14 diverse domains:

ModelAverage⬆️wiki_enwiki_zhweb_enweb_zhhealthcare_enhealthcare_zhlaw_enarxiv_ennews_ennews_zhfinance_enfinance_zhmsmarco_en
INF-Retriever-v152.5665.2568.4452.1356.656.9642.0334.5150.6253.3250.0258.3435.4259.64
GTE-Qwen2-7B-instruct48.3863.4666.4451.251.9854.238.8222.3140.2754.0743.0358.226.6358.39
BGE-Multilingual-Gemma246.8363.7167.350.3853.2447.2442.1322.5823.2850.9144.0249.331.663.14
BGE-M346.6560.4962.3647.3550.3849.142.3826.6840.7648.0440.7551.5232.1854.4
E5-mistral-7b-instruct45.2661.6755.9744.4145.9656.3235.7919.3244.7848.1835.9954.7926.1159.03

On AIR-Bench 24.05 (multilingual, 13 languages), INF-Retriever-v1 achieves an average score of 54.47, surpassing all competitors and demonstrating robust zero-shot transfer beyond its training languages.

best for

FAQ

What is the embedding dimension of INF-Retriever V1?

The embedding dimension is 3584.

What is the maximum input token length?

The model supports up to 32768 tokens per input.

How do I use INF-Retriever V1 via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; send requests with the model name "infly/inf-retriever-v1" and input text.

Does INF-Retriever V1 require instruction prefixes for queries?

Yes, for best results, prepend each query with "Instruct: [task description]\nQuery: [query]" as shown in the model card.

not yet live

We're benchmarking and onboarding INF-Retriever V1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →