Qwen3 0.6B
Qwen/Qwen3-0.6B
published Apr 2025 · updated Jul 2025
Qwen3 0.6B is a chat model that supports seamless switching between thinking mode for complex reasoning and non-thinking mode for efficient dialogue.
specs
| Task | Causal Language Modeling |
| Architecture | Dense Transformer with Grouped-Query Attention (16 Q heads, 8 KV heads) |
| Parameters | 0.6B (0.44B non-embedding) |
| Context Length | 32,768 tokens |
| License | Apache 2.0 |
about this model
Qwen3-0.6B is a causal language model for chat that supports seamless switching between thinking mode (for complex multi-step reasoning) and non-thinking mode (for efficient general-purpose dialogue) within a single model, eliminating the need to use separate chat and reasoning models.
Architecture and Capabilities
The model has 0.6B parameters (0.44B non-embedding), 28 layers, a context length of 32,768 tokens, and uses Grouped Query Attention with 16 query heads and 8 key-value heads. It is the smallest dense variant in the Qwen3 series and covers 119 languages and dialects. Key capabilities include:
- Reasoning – Enhanced performance on mathematics, code generation, and commonsense logical reasoning over QwQ and Qwen2.5 instruct models.
- Alignment – Superior performance in creative writing, role-playing, multi-turn dialogues, and instruction following.
- Agent capabilities – Precise tool integration in both thinking and non-thinking modes.
- Thinking budget – Adaptive allocation of computational resources during inference to balance latency and task complexity.
The model is released under the Apache 2.0 license and uses tie embeddings.
best for
- ·Complex reasoning tasks (math, code, logic) using thinking mode
- ·Efficient general-purpose dialogue using non-thinking mode
- ·Multilingual instruction following and translation
- ·Tool calling and agent-based workflows
FAQ
Thinking mode generates a reasoning chain before the final answer, improving complex tasks. Non-thinking mode skips reasoning for faster, direct responses.
Set `enable_thinking=True` or `False` in the chat template. You can also use `/think` or `/no_think` in user messages when thinking mode is enabled.
It supports a maximum context length of 32,768 tokens.
Qwen3 0.6B is released under the Apache 2.0 license.
Use the gigarouter OpenAI-compatible endpoint with your API key. Set the model name to Qwen3-0.6B and pass messages in chat format.
We're benchmarking and onboarding Qwen3 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.