F2LLM-v2-4B
codefuse-ai/F2LLM-v2-4B
published Mar 2026 · updated May 2026
F2LLM-v2-4B is a multilingual embedding model that generates high-quality text embeddings for information retrieval, semantic search, and classification across 200+ languages.
specs
| Task | Embedding |
| Architecture | Qwen3 |
| Parameters | 4B |
| License | CC BY-NC-ND 4.0 |
about this model
F2LLM-v2-4B is a general-purpose, multilingual embedding model that converts text into dense vector representations for retrieval, semantic search, classification, and clustering tasks. It is part of the F2LLM-v2 family, which includes 8 model sizes from 80M to 14B parameters, trained on a curated composite of 60 million publicly available high-quality data samples.
The model supports more than 200 natural languages, with particular emphasis on mid- and low-resource languages, and also covers over 40 programming languages. It integrates a two-stage LLM-based embedding training pipeline with Matryoshka Representation Learning (MRL), model pruning, and knowledge distillation techniques to balance performance with computational efficiency.
Key capabilities
- Produces 2560-dimensional embeddings with support for Matryoshka-style dimensionality reduction
- Supports custom instruction prompts for asymmetric retrieval tasks (query vs. document encoding)
- Handles symmetric tasks such as semantic textual similarity, clustering, and bitext mining with or without prompts
- Based on the Qwen3 architecture and optimized for use with Sentence Transformers and Transformers libraries
Performance
The larger F2LLM-v2-14B model ranks first on 11 MTEB benchmarks. The 4B variant delivers competitive performance for its size, benefiting from the same training pipeline and data as the larger models. The family sets a new state of the art for resource-constrained applications.
Training and techniques
- Two-stage LLM-based embedding training pipeline
- Matryoshka Representation Learning for flexible embedding dimensionality
- Model pruning and knowledge distillation for efficiency
- Training data covers 282 natural languages and over 40 programming languages
License
CC BY-NC-ND 4.0.
best for
- ·Multilingual semantic search and retrieval
- ·Cross-lingual text classification and clustering
- ·Information retrieval for low-resource languages
FAQ
It is designed for multilingual embedding tasks including retrieval, semantic search, classification, and clustering, with strong support for mid- and low-resource languages.
At 4B parameters it is a medium-sized option in the F2LLM-v2 family, offering a balance between performance and efficiency, with smaller models available down to 80M.
It is released under the CC BY-NC-ND 4.0 license (non-commercial, no derivatives).
For queries, prepend the prompt "Instruct: Given a question, retrieve passages that can help answer the question.\nQuery: "; for documents, use no prompt. The model expects text and returns normalized embeddings of dimension 2560.
Use the OpenAI-compatible endpoint with your API key, sending a request with the model name and input text to get embeddings.
We're benchmarking and onboarding F2LLM-v2-4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.