Qwopus 3.6-35B-A3B Coder

Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF

published Jun 2026 · updated Jun 2026

Qwopus 3.6-35B-A3B Coder is a VLM that provides a token-efficient coding agent fine-tuned for fast tool-calling and stable multi-turn code workflows.

status

coming soon

API providers

downloads / mo

44.8K

license

apache-2.0

specs

Task	Image-text-to-text / coding agent
Architecture	Hybrid sparse Mixture of Experts (MoE)
Parameters	35B total, ~3B active per token
License	Apache-2.0

about this model

Qwopus-3.6-35B-A3B-Coder-MTP-GGUF is a vision-language model (VLM) fine-tuned for token-efficient, agentic coding workflows on a 35B total parameter, approximately 3B active parameter Mixture-of-Experts (MoE) architecture. It is designed for real agentic coding loops where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work — without forcing every step into a long, token-expensive reasoning mode.

Core Capabilities

The model optimizes for execution efficiency across multi-turn tool calls. Its key strengths are faster next-step decisions, lower token waste, improved workflow stability across file edits and retries, and suitability for local or self-hosted inference stacks. It targets Codex-style, OpenHands-style, Claude Code-style, and OpenCode-style agent harnesses.

Architecture and Training

Built on the Qwopus3.6-35B-A3B-v1 line (itself derived from Qwen3.6-35B-A3B), the model uses a hybrid sparse MoE with 35B total parameters and approximately 3B active parameters per token. It supports multimodal input (image-text-to-text) via the Qwen3.6 chat template with tool-calling XML format. The GGUF variant is available under the Apache-2.0 license.

Attribute	Specifications
Architecture	Hybrid sparse MoE, 35B total / ~3B active parameters per token
Base Developer	Alibaba Cloud / Qwen family (via Qwen3.6-35B-A3B)
License	Apache-2.0
Inference Support	llama.cpp, vLLM, SGLang, Unsloth, Transformers

Model Status

This is an experimental community model intended for research and local coding-agent evaluation. It has not undergone complete safety evaluation or broad general-domain benchmarking. As of release, the model has 44.8k downloads on its GGUF variant and 121 likes on the Hugging Face repository.

best for

·Automated multi-turn code debugging and patching
·High-frequency tool calling in agent harnesses
·Low-latency local or self-hosted coding workflows

FAQ

What is this model best used for?

It is best for agentic coding workflows where the model repeatedly reads files, calls tools, edits code, runs tests, and reacts to errors without long reasoning traces.

What is the model's size and speed profile?

It has 35B total parameters but only ~3B active per token, making it efficient for high-throughput local deployment and low token waste.

What license does this model use?

The model is released under Apache-2.0 license.

What input and output formats does it support?

It supports text and image inputs in the Qwen3.6 chat template with tool-calling XML format; outputs include text and tool call commands.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key; the model supports llama.cpp, vLLM, SGLang, Unsloth, and Transformers.

not yet live

We're benchmarking and onboarding Qwopus 3.6-35B-A3B Coder as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit