skip to content
gigarouter gigarouter
models / vision-language · coming soon

Qwen 3.6 35B A3B

unsloth/Qwen3.6-35B-A3B-GGUF

published Apr 2026 · updated Apr 2026

Qwen 3.6 35B A3B is a vision-language model specialized for agentic coding, tool calling, and reasoning preservation with a Mixture of Experts architecture.

status
coming soon
API providers
0
downloads / mo
874.6K
license
apache-2.0

specs

TaskVision-Language Modeling & Agentic Coding
ArchitectureMixture of Experts (256 experts, 8 routed + 1 shared) with Gated Attention and Gated DeltaNet
Parameters35B total, 3B activated
Context Length262,144 tokens native, extensible to 1,010,000 tokens

about this model

Qwen3.6-35B-A3B is a vision-language model combining a causal language model with a vision encoder, optimized for agentic coding and real-world utility. With 35B total parameters and 3B activated per token across 256 experts, it achieves competitive performance while remaining computationally efficient. The model natively supports 262,144-token context, extendable to over 1M tokens.

Key strengths include improved handling of frontend workflows and repository-level reasoning, plus a new thinking preservation option that retains reasoning context across iterations for streamlined development. Multi-token prediction (MTP) enables 1.4–2.2x faster inference without accuracy loss.

Benchmark results (from the model card) demonstrate strong coding agent performance:

BenchmarkScore
SWE-bench Verified73.4
SWE-bench Multilingual67.2
SWE-bench Pro49.5
Terminal-Bench 2.051.5
Claw-Eval (Avg)68.7
SkillsBench (Avg5)28.7
QwenWebBench1397
Bar chart comparing Qwen3.6-35B-A3B against Qwen3.5-27B, Gemma4-31B, Qwen3.5-35BA3B, and Gemma4-26BA4B on coding agent benchmarks.

This model is available as a hosted API on gigarouter, providing OpenAI-compatible access without local infrastructure overhead.

best for

FAQ

What is the memory requirement for the 4-bit GGUF quant?

Approximately 23 GB of VRAM for the 4-bit quantized version.

What inference settings are recommended for precise coding tasks?

Use thinking mode with temperature=0.6, top_p=0.95, and top_k=20.

How can I avoid gibberish outputs when running this model?

Ensure context length is not set too low and use CUDA version below 13.2 or 13.3; avoid CUDA 13.2.

How do I access this model via an API?

Use the gigarouter OpenAI-compatible endpoint with your API key; pass the model name and your input.

Can this model process images?

Yes, it includes a vision encoder and supports vision-language inputs for tasks like frontend screenshot understanding.

not yet live

We're benchmarking and onboarding Qwen 3.6 35B A3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →