Qwen3 0.6B

Qwen/Qwen3-0.6B

published Apr 2025 · updated Jul 2025

Qwen3 0.6B is a chat model that supports seamless switching between thinking mode for complex reasoning and non-thinking mode for efficient dialogue.

status

coming soon

specs

Task	Causal Language Modeling
Architecture	Dense Transformer with Grouped-Query Attention (16 Q heads, 8 KV heads)
Parameters	0.6B (0.44B non-embedding)
Context Length	32,768 tokens
License	Apache 2.0

about this model

Qwen3-0.6B is a causal language model for chat that supports seamless switching between thinking mode (for complex multi-step reasoning) and non-thinking mode (for efficient general-purpose dialogue) within a single model, eliminating the need to use separate chat and reasoning models.

Architecture and Capabilities

The model has 0.6B parameters (0.44B non-embedding), 28 layers, a context length of 32,768 tokens, and uses Grouped Query Attention with 16 query heads and 8 key-value heads. It is the smallest dense variant in the Qwen3 series and covers 119 languages and dialects. Key capabilities include:

Reasoning – Enhanced performance on mathematics, code generation, and commonsense logical reasoning over QwQ and Qwen2.5 instruct models.
Alignment – Superior performance in creative writing, role-playing, multi-turn dialogues, and instruction following.
Agent capabilities – Precise tool integration in both thinking and non-thinking modes.
Thinking budget – Adaptive allocation of computational resources during inference to balance latency and task complexity.

The model is released under the Apache 2.0 license and uses tie embeddings.

best for

·Complex reasoning tasks (math, code, logic) using thinking mode
·Efficient general-purpose dialogue using non-thinking mode
·Multilingual instruction following and translation
·Tool calling and agent-based workflows

FAQ

What is the difference between thinking and non-thinking modes?

Thinking mode generates a reasoning chain before the final answer, improving complex tasks. Non-thinking mode skips reasoning for faster, direct responses.

How do I switch between thinking and non-thinking modes?

Set `enable_thinking=True` or `False` in the chat template. You can also use `/think` or `/no_think` in user messages when thinking mode is enabled.

What is the context length of Qwen3 0.6B?

It supports a maximum context length of 32,768 tokens.

Under what license is this model released?

Qwen3 0.6B is released under the Apache 2.0 license.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Set the model name to Qwen3-0.6B and pass messages in chat format.

not yet live

We're benchmarking and onboarding Qwen3 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-gen models

compare all →

Qwen2.5-0.5B-Instruct

text-gen