Qwen 3.6 35B A3B

unsloth/Qwen3.6-35B-A3B-GGUF

published Apr 2026 · updated Apr 2026

Qwen 3.6 35B A3B is a vision-language model specialized for agentic coding, tool calling, and reasoning preservation with a Mixture of Experts architecture.

status

coming soon

API providers

downloads / mo

874.6K

license

apache-2.0

specs

Task	Vision-Language Modeling & Agentic Coding
Architecture	Mixture of Experts (256 experts, 8 routed + 1 shared) with Gated Attention and Gated DeltaNet
Parameters	35B total, 3B activated
Context Length	262,144 tokens native, extensible to 1,010,000 tokens

about this model

Qwen3.6-35B-A3B is a vision-language model combining a causal language model with a vision encoder, optimized for agentic coding and real-world utility. With 35B total parameters and 3B activated per token across 256 experts, it achieves competitive performance while remaining computationally efficient. The model natively supports 262,144-token context, extendable to over 1M tokens.

Key strengths include improved handling of frontend workflows and repository-level reasoning, plus a new thinking preservation option that retains reasoning context across iterations for streamlined development. Multi-token prediction (MTP) enables 1.4–2.2x faster inference without accuracy loss.

Benchmark results (from the model card) demonstrate strong coding agent performance:

Benchmark	Score
SWE-bench Verified	73.4
SWE-bench Multilingual	67.2
SWE-bench Pro	49.5
Terminal-Bench 2.0	51.5
Claw-Eval (Avg)	68.7
SkillsBench (Avg5)	28.7
QwenWebBench	1397

Bar chart comparing Qwen3.6-35B-A3B against Qwen3.5-27B, Gemma4-31B, Qwen3.5-35BA3B, and Gemma4-26BA4B on coding agent benchmarks.

This model is available as a hosted API on gigarouter, providing OpenAI-compatible access without local infrastructure overhead.

best for

·Building automated software engineering agents (e.g., SWE-bench, terminal tasks)
·Frontend and repository-level code generation and debugging
·Iterative development with preserved reasoning context across messages

FAQ

What is the memory requirement for the 4-bit GGUF quant?

Approximately 23 GB of VRAM for the 4-bit quantized version.

What inference settings are recommended for precise coding tasks?

Use thinking mode with temperature=0.6, top_p=0.95, and top_k=20.

How can I avoid gibberish outputs when running this model?

Ensure context length is not set too low and use CUDA version below 13.2 or 13.3; avoid CUDA 13.2.

How do I access this model via an API?

Use the gigarouter OpenAI-compatible endpoint with your API key; pass the model name and your input.

Can this model process images?

Yes, it includes a vision encoder and supports vision-language inputs for tasks like frontend screenshot understanding.

not yet live

We're benchmarking and onboarding Qwen 3.6 35B A3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit