Qwopus 3.6-35B-A3B Coder
Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF
published Jun 2026 · updated Jun 2026
Qwopus 3.6-35B-A3B Coder is a VLM that provides a token-efficient coding agent fine-tuned for fast tool-calling and stable multi-turn code workflows.
specs
| Task | Image-text-to-text / coding agent |
| Architecture | Hybrid sparse Mixture of Experts (MoE) |
| Parameters | 35B total, ~3B active per token |
| License | Apache-2.0 |
about this model
Qwopus-3.6-35B-A3B-Coder-MTP-GGUF is a vision-language model (VLM) fine-tuned for token-efficient, agentic coding workflows on a 35B total parameter, approximately 3B active parameter Mixture-of-Experts (MoE) architecture. It is designed for real agentic coding loops where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work — without forcing every step into a long, token-expensive reasoning mode.
Core Capabilities
The model optimizes for execution efficiency across multi-turn tool calls. Its key strengths are faster next-step decisions, lower token waste, improved workflow stability across file edits and retries, and suitability for local or self-hosted inference stacks. It targets Codex-style, OpenHands-style, Claude Code-style, and OpenCode-style agent harnesses.
Architecture and Training
Built on the Qwopus3.6-35B-A3B-v1 line (itself derived from Qwen3.6-35B-A3B), the model uses a hybrid sparse MoE with 35B total parameters and approximately 3B active parameters per token. It supports multimodal input (image-text-to-text) via the Qwen3.6 chat template with tool-calling XML format. The GGUF variant is available under the Apache-2.0 license.
| Attribute | Specifications |
|---|---|
| Architecture | Hybrid sparse MoE, 35B total / ~3B active parameters per token |
| Base Developer | Alibaba Cloud / Qwen family (via Qwen3.6-35B-A3B) |
| License | Apache-2.0 |
| Inference Support | llama.cpp, vLLM, SGLang, Unsloth, Transformers |
Model Status
This is an experimental community model intended for research and local coding-agent evaluation. It has not undergone complete safety evaluation or broad general-domain benchmarking. As of release, the model has 44.8k downloads on its GGUF variant and 121 likes on the Hugging Face repository.
best for
- ·Automated multi-turn code debugging and patching
- ·High-frequency tool calling in agent harnesses
- ·Low-latency local or self-hosted coding workflows
FAQ
It is best for agentic coding workflows where the model repeatedly reads files, calls tools, edits code, runs tests, and reacts to errors without long reasoning traces.
It has 35B total parameters but only ~3B active per token, making it efficient for high-throughput local deployment and low token waste.
The model is released under Apache-2.0 license.
It supports text and image inputs in the Qwen3.6 chat template with tool-calling XML format; outputs include text and tool call commands.
Use the gigarouter OpenAI-compatible endpoint with your API key; the model supports llama.cpp, vLLM, SGLang, Unsloth, and Transformers.
We're benchmarking and onboarding Qwopus 3.6-35B-A3B Coder as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.