Qwen3-VL Embedding 8B

Qwen/Qwen3-VL-Embedding-8B

published Jan 2026 · updated Apr 2026

Qwen3-VL Embedding 8B is a multimodal embedding model that generates high-dimensional vectors from text, images, screenshots, and videos for retrieval and clustering.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

1.1M

license

apache-2.0

specs

Task	Multimodal Embedding
Architecture	Qwen3-VL (transformer)
Parameters	8B
License	Apache 2.0
Context Length	32K tokens
Embedding Dimension	Up to 4096 (custom 64-4096 via MRL)

about this model

Qwen3-VL-Embedding-8B is a multimodal embedding model that generates high-dimensional vectors from text, images, screenshots, and video inputs, supporting over 30 languages and a context length of 32K tokens. Built on the Qwen3-VL foundation, it employs a multi-stage training paradigm progressing from large-scale contrastive pre-training to reranking model distillation.

Key capabilities

Flexible dimensions: Output embeddings from 64 to 4096 via Matryoshka Representation Learning (MRL), with quantization support.
Instruction-aware: Custom instructions improve downstream task performance by 1–5%.
Unified representation: Maps text, images, document images, and video into a shared semantic space for efficient retrieval, clustering, and cross-modal matching.

Benchmark performance

On the MMEB-V2 benchmark (78 datasets), Qwen3-VL-Embedding-8B achieves an overall score of 77.9, ranking first among all models as of January 8, 2025. The table below shows its performance across image, video, and visual document tasks.

Task group	Image	Video	VisDoc	All
Qwen3-VL-Embedding-8B	80.1	66.1	83.3	77.9

On the MMTEB benchmark (multilingual text tasks), the model achieves a mean task score of 67.88 and a mean type score of 58.88, with strong results across retrieval, classification, clustering, and STS.

Qwen3-VL-Embedding-8B is designed for retrieval-augmented generation pipelines and can be paired with the Qwen3-VL-Reranker model for a two-stage multimodal search system. The model is licensed under Apache 2.0 and is available as a managed API on gigarouter.

best for

·Image-text retrieval
·Video-text matching
·Multimodal content clustering

FAQ

What is the maximum input length for Qwen3-VL Embedding 8B?

32K tokens.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key.

What languages does it support?

Over 30 languages (33 specifically), including English, Chinese, Spanish, and others.

Can I customize the output embedding dimension?

Yes, using Matryoshka Representation Learning (MRL), you can set any dimension from 64 to 4096.

What is the license for Qwen3-VL Embedding 8B?

Apache 2.0.

not yet live

We're benchmarking and onboarding Qwen3-VL Embedding 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5