Qwen3.5 0.8B

Qwen/Qwen3.5-0.8B

published Feb 2026 · updated Mar 2026

Qwen3.5 0.8B is a vlm model that integrates early fusion of vision and language with a hybrid Gated DeltaNet and attention architecture for efficient multimodal understanding and reasoning.

est. price

~$0.235

/ 1k images · estimated, set at launch

API providers

downloads / mo

2.5M

license

apache-2.0

specs

Task	Vision-Language Model (VLM)
Architecture	Causal Language Model with Vision Encoder, Gated DeltaNet + Attention + FFN
Parameters	0.8B
Context Length	262,144 tokens

about this model

Qwen3.5-0.8B is a vision-language model (VLM) that integrates a causal language model with a vision encoder, designed for multimodal understanding and reasoning tasks. It is the smallest variant in the Qwen3.5 family, released on 2026-03-02, and is built on a unified vision-language foundation with early fusion training on multimodal tokens, achieving cross-generational parity with prior Qwen3 models across reasoning, coding, agents, and visual understanding benchmarks.

Architecture and Key Strengths

The model employs an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts, enabling high-throughput inference with minimal latency and cost overhead. It supports a native context length of 262,144 tokens. Reinforcement learning was scaled across million-agent environments with progressively complex task distributions, enhancing real-world adaptability. The model also expands global linguistic coverage to 201 languages and dialects.

Benchmark Results

In non-thinking mode, Qwen3.5-0.8B achieves 29.7 on MMLU-Pro, 48.5 on MMLU-Redux, 46.4 on C-Eval, 16.9 on SuperGPQA, 52.1 on IFEval, and 34.1 on MMMLU. In thinking mode, results improve to 42.3 on MMLU-Pro, 59.5 on MMLU-Redux, 50.5 on C-Eval, 21.3 on SuperGPQA, 11.9 on GPQA, 44.0 on IFEval, and 21.0 on IFBench. These figures are compared against larger models in the Qwen family, demonstrating competitive performance at a smaller parameter scale.

Model Details

Parameters: 0.8B
Hidden dimension: 1024
Layers: 24
Context length: 262,144 tokens natively
Architecture: Gated DeltaNet and Gated Attention with Feed-Forward Networks

best for

·Prototyping and task-specific fine-tuning
·Research and development in multimodal AI
·Multilingual and cross-cultural applications
·Lightweight on-device or edge deployment

FAQ

What is Qwen3.5 0.8B best used for?

It is designed for prototyping, task-specific fine-tuning, and research or development purposes, especially in multimodal and multilingual scenarios.

How does the 0.8B model compare to other Qwen3.5 sizes?

It is the smallest model in the Qwen3.5 family, trading some benchmark performance for lower compute and memory requirements.

What is the context length of Qwen3.5 0.8B?

It supports a native context length of 262,144 tokens.

How can I call Qwen3.5 0.8B via the API on gigarouter?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending text and image inputs to generate responses.

What input formats does the model accept?

It accepts text and images as input, producing text output. The model uses a vision encoder for image understanding.

not yet live

We're benchmarking and onboarding Qwen3.5 0.8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit