F2LLM-v2-4B

codefuse-ai/F2LLM-v2-4B

published Mar 2026 · updated May 2026

F2LLM-v2-4B is a multilingual embedding model that generates high-quality text embeddings for information retrieval, semantic search, and classification across 200+ languages.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

41.1K

license

apache-2.0

specs

Task	Embedding
Architecture	Qwen3
Parameters	4B
License	CC BY-NC-ND 4.0

about this model

F2LLM-v2-4B is a general-purpose, multilingual embedding model that converts text into dense vector representations for retrieval, semantic search, classification, and clustering tasks. It is part of the F2LLM-v2 family, which includes 8 model sizes from 80M to 14B parameters, trained on a curated composite of 60 million publicly available high-quality data samples.

The model supports more than 200 natural languages, with particular emphasis on mid- and low-resource languages, and also covers over 40 programming languages. It integrates a two-stage LLM-based embedding training pipeline with Matryoshka Representation Learning (MRL), model pruning, and knowledge distillation techniques to balance performance with computational efficiency.

Key capabilities

Produces 2560-dimensional embeddings with support for Matryoshka-style dimensionality reduction
Supports custom instruction prompts for asymmetric retrieval tasks (query vs. document encoding)
Handles symmetric tasks such as semantic textual similarity, clustering, and bitext mining with or without prompts
Based on the Qwen3 architecture and optimized for use with Sentence Transformers and Transformers libraries

Performance

The larger F2LLM-v2-14B model ranks first on 11 MTEB benchmarks. The 4B variant delivers competitive performance for its size, benefiting from the same training pipeline and data as the larger models. The family sets a new state of the art for resource-constrained applications.

Training and techniques

Two-stage LLM-based embedding training pipeline
Matryoshka Representation Learning for flexible embedding dimensionality
Model pruning and knowledge distillation for efficiency
Training data covers 282 natural languages and over 40 programming languages

License

CC BY-NC-ND 4.0.

best for

·Multilingual semantic search and retrieval
·Cross-lingual text classification and clustering
·Information retrieval for low-resource languages

FAQ

What is the main use case for F2LLM-v2-4B?

It is designed for multilingual embedding tasks including retrieval, semantic search, classification, and clustering, with strong support for mid- and low-resource languages.

How does the model compare in size and speed to other embedding models?

At 4B parameters it is a medium-sized option in the F2LLM-v2 family, offering a balance between performance and efficiency, with smaller models available down to 80M.

What license does this model use?

It is released under the CC BY-NC-ND 4.0 license (non-commercial, no derivatives).

How should I structure input for retrieval tasks?

For queries, prepend the prompt "Instruct: Given a question, retrieve passages that can help answer the question.\nQuery: "; for documents, use no prompt. The model expects text and returns normalized embeddings of dimension 2560.

How can I call this model via gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a request with the model name and input text to get embeddings.

not yet live

We're benchmarking and onboarding F2LLM-v2-4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5