Qwen2.5 0.5B Instruct

Qwen/Qwen2.5-0.5B-Instruct

published Sep 2024 · updated Sep 2024

Qwen2.5 0.5B Instruct is a lightweight causal language model instruction-tuned for chat, coding, and mathematics, supporting up to 128K tokens of context.

est. price

~$0.12

/ 1M tokens · estimated, set at launch

specs

Task	Chat / Instruction Following
Architecture	Transformer with RoPE, SwiGLU, RMSNorm, GQA
Parameters	0.49B total (0.36B non-embedding)
License	Apache 2.0
Context Length	32,768 tokens input, 8,192 tokens generation
Multilingual	29+ languages including Chinese, English, French, Spanish, etc.

about this model

Qwen2.5-0.5B-Instruct is a causal language model for chat, instruction-tuned from the Qwen2.5 series. It is designed for general-purpose conversational AI, demonstrating strong capabilities in instruction following, generating long texts (up to 8,192 tokens), understanding structured data such as tables, and producing structured outputs like JSON. The model supports a context length of up to 32,768 tokens and is resilient to diverse system prompts, making it suitable for role-play and conditional chatbots. Multilingual support covers over 29 languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic. Key improvements over the previous Qwen2 generation include enhanced knowledge depth, significantly better performance in coding and mathematics, and improved long-context handling. The model achieves these gains through specialized expert training in reasoning and structured generation tasks. Architectural specifics: 0.49 billion total parameters (0.36B non-embedding), 24 layers, grouped-query attention with 14 query heads and 2 key/value heads, RoPE, SwiGLU activation, RMSNorm, and tied word embeddings. Context length is 32,768 tokens; maximum generation length is 8,192 tokens. Licensed under Apache 2.0. The model is part of the Qwen2.5 lineup, which builds on the Qwen2 foundation and open-weight philosophy, offering a compact yet capable choice for developers requiring efficient, instruction-tuned chat performance via API.

best for

·Lightweight chatbot for general conversation
·Code generation and explanation
·Math problem solving with step-by-step reasoning
·Multilingual assistant for non-English languages

FAQ

What input format does Qwen2.5 0.5B Instruct expect?

It uses a chat template with system, user, and assistant roles. Apply the tokenizer's apply_chat_template method.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model name as "Qwen2.5-0.5B-Instruct".

What is the license for Qwen2.5 0.5B Instruct?

It is licensed under Apache 2.0, allowing free use, modification, and distribution.

What is the maximum context length and output length?

The model supports up to 32,768 tokens of input context and can generate up to 8,192 tokens.

How does this model compare to larger Qwen2.5 models?

It is much smaller (0.49B parameters) and faster, suitable for resource-constrained environments, but with lower overall capability in complex tasks.

not yet live

We're benchmarking and onboarding Qwen2.5 0.5B Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-gen models

compare all →

Qwen3-0.6B

text-gen