skip to content
gigarouter gigarouter
models / embeddings · coming soon

F2LLM V2 1.7B

codefuse-ai/F2LLM-v2-1.7B

published Mar 2026 · updated May 2026

F2LLM V2 1.7B is a multilingual embedding model that supports over 200 languages, optimized for retrieval, semantic search, and classification tasks.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
3.8K
license
apache-2.0

specs

TaskEmbedding
ArchitectureQwen3
Parameters1.7B
LicenseApache 2.0

about this model

F2LLM-v2-1.7B is a general-purpose, multilingual embedding model that converts text into dense vector representations for information retrieval, semantic search, and text classification. It is the 1.7-billion-parameter instruct variant of the F2LLM-v2 family, fine-tuned from the codefuse-ai/F2LLM-v2-1.7B-Preview base model and released under the Apache 2.0 license.

Capabilities and Training

Trained on a curated composite of 60 million high-quality samples from the codefuse-ai/F2LLM-v2 dataset, the model supports over 200 languages with a particular focus on mid- and low-resource languages. According to the research paper (arXiv:2603.19223), the training corpus covers 282 natural languages and more than 40 programming languages. The architecture is based on Qwen3 and produces output embeddings with a fixed dimension of 2048.

Optimization Techniques

The model integrates a two-stage LLM-based embedding training pipeline with Matryoshka Representation Learning (MRL), model pruning, and knowledge distillation, enabling efficient inference while maintaining competitive retrieval performance.

Benchmark Performance

While specific MTEB scores for the 1.7B variant are not published, the family’s largest model, F2LLM-v2-14B, ranks first on 11 MTEB benchmarks. The 1.7B model inherits the training methodology and is designed to deliver strong embedding quality at a compact size suitable for resource-constrained applications.

Model Family Overview

SizeBase ModelInstruct Model
80MF2LLM-v2-80M
160MF2LLM-v2-160M
330MF2LLM-v2-330M
0.6BPreviewF2LLM-v2-0.6B
1.7BPreviewF2LLM-v2-1.7B (this model)
4BPreviewF2LLM-v2-4B
8BPreviewF2LLM-v2-8B
14BPreviewF2LLM-v2-14B

All models, training data, code, and intermediate checkpoints are publicly available. Intermediate checkpoints for this model are provided in the intermediate_checkpoints branch on Hugging Face.

best for

FAQ

What is F2LLM V2 1.7B best used for?

It excels at multilingual retrieval, semantic search, text classification, and clustering, especially for mid- and low-resource languages.

What is the model's output embedding dimension?

The embedding dimension is 2048.

What license is this model released under?

It is released under the Apache 2.0 license.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; refer to the gigarouter documentation for endpoint details.

Does the model require a special prompt format?

Yes, for asymmetric retrieval use the format 'Instruct: your_instruction\nQuery: ' for queries; documents do not need a prompt.

not yet live

We're benchmarking and onboarding F2LLM V2 1.7B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →