Qwen3-VL Embedding 8B
Qwen/Qwen3-VL-Embedding-8B
published Jan 2026 · updated Apr 2026
Qwen3-VL Embedding 8B is a multimodal embedding model that generates high-dimensional vectors from text, images, screenshots, and videos for retrieval and clustering.
specs
| Task | Multimodal Embedding |
| Architecture | Qwen3-VL (transformer) |
| Parameters | 8B |
| License | Apache 2.0 |
| Context Length | 32K tokens |
| Embedding Dimension | Up to 4096 (custom 64-4096 via MRL) |
about this model
Qwen3-VL-Embedding-8B is a multimodal embedding model that generates high-dimensional vectors from text, images, screenshots, and video inputs, supporting over 30 languages and a context length of 32K tokens. Built on the Qwen3-VL foundation, it employs a multi-stage training paradigm progressing from large-scale contrastive pre-training to reranking model distillation.
Key capabilities
- Flexible dimensions: Output embeddings from 64 to 4096 via Matryoshka Representation Learning (MRL), with quantization support.
- Instruction-aware: Custom instructions improve downstream task performance by 1–5%.
- Unified representation: Maps text, images, document images, and video into a shared semantic space for efficient retrieval, clustering, and cross-modal matching.
Benchmark performance
On the MMEB-V2 benchmark (78 datasets), Qwen3-VL-Embedding-8B achieves an overall score of 77.9, ranking first among all models as of January 8, 2025. The table below shows its performance across image, video, and visual document tasks.
| Task group | Image | Video | VisDoc | All |
|---|---|---|---|---|
| Qwen3-VL-Embedding-8B | 80.1 | 66.1 | 83.3 | 77.9 |
On the MMTEB benchmark (multilingual text tasks), the model achieves a mean task score of 67.88 and a mean type score of 58.88, with strong results across retrieval, classification, clustering, and STS.
Qwen3-VL-Embedding-8B is designed for retrieval-augmented generation pipelines and can be paired with the Qwen3-VL-Reranker model for a two-stage multimodal search system. The model is licensed under Apache 2.0 and is available as a managed API on gigarouter.
best for
- ·Image-text retrieval
- ·Video-text matching
- ·Multimodal content clustering
FAQ
32K tokens.
Use the gigarouter OpenAI-compatible endpoint with an API key.
Over 30 languages (33 specifically), including English, Chinese, Spanish, and others.
Yes, using Matryoshka Representation Learning (MRL), you can set any dimension from 64 to 4096.
Apache 2.0.
We're benchmarking and onboarding Qwen3-VL Embedding 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.