Qwen3.5 0.8B
Qwen/Qwen3.5-0.8B
published Feb 2026 · updated Mar 2026
Qwen3.5 0.8B is a vlm model that integrates early fusion of vision and language with a hybrid Gated DeltaNet and attention architecture for efficient multimodal understanding and reasoning.
specs
| Task | Vision-Language Model (VLM) |
| Architecture | Causal Language Model with Vision Encoder, Gated DeltaNet + Attention + FFN |
| Parameters | 0.8B |
| Context Length | 262,144 tokens |
about this model
Architecture and Key Strengths
The model employs an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts, enabling high-throughput inference with minimal latency and cost overhead. It supports a native context length of 262,144 tokens. Reinforcement learning was scaled across million-agent environments with progressively complex task distributions, enhancing real-world adaptability. The model also expands global linguistic coverage to 201 languages and dialects.
Benchmark Results
In non-thinking mode, Qwen3.5-0.8B achieves 29.7 on MMLU-Pro, 48.5 on MMLU-Redux, 46.4 on C-Eval, 16.9 on SuperGPQA, 52.1 on IFEval, and 34.1 on MMMLU. In thinking mode, results improve to 42.3 on MMLU-Pro, 59.5 on MMLU-Redux, 50.5 on C-Eval, 21.3 on SuperGPQA, 11.9 on GPQA, 44.0 on IFEval, and 21.0 on IFBench. These figures are compared against larger models in the Qwen family, demonstrating competitive performance at a smaller parameter scale.
Model Details
- Parameters: 0.8B
- Hidden dimension: 1024
- Layers: 24
- Context length: 262,144 tokens natively
- Architecture: Gated DeltaNet and Gated Attention with Feed-Forward Networks

best for
- ·Prototyping and task-specific fine-tuning
- ·Research and development in multimodal AI
- ·Multilingual and cross-cultural applications
- ·Lightweight on-device or edge deployment
FAQ
It is designed for prototyping, task-specific fine-tuning, and research or development purposes, especially in multimodal and multilingual scenarios.
It is the smallest model in the Qwen3.5 family, trading some benchmark performance for lower compute and memory requirements.
It supports a native context length of 262,144 tokens.
Use the OpenAI-compatible endpoint with your gigarouter API key, sending text and image inputs to generate responses.
It accepts text and images as input, producing text output. The model uses a vision encoder for image understanding.
We're benchmarking and onboarding Qwen3.5 0.8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.