skip to content
gigarouter gigarouter
models / vision-language · coming soon

Qwen3.6 27B

unsloth/Qwen3.6-27B-MTP-GGUF

published May 2026 · updated May 2026

Qwen3.6 27B is a vision-language model that excels at agentic coding, repository-level reasoning, and preserving thinking context across iterative development.

status
coming soon
API providers
0
downloads / mo
1.8M
license
apache-2.0

specs

TaskVision-Language Model (VLM)
ArchitectureCausal Language Model with Vision Encoder, Gated DeltaNet + Gated Attention
Parameters27B
LicenseSee model card

about this model

Qwen3.6-27B-MTP-GGUF is a vision-language model (VLM) that extends the Qwen3.5 series with a causal language model and vision encoder, optimized for agentic coding, reasoning preservation, and tool-calling reliability. The model has 27 billion parameters, 64 layers, a hidden dimension of 5120, and a native context length of 262,144 tokens (extendable to approximately 1 million tokens via YaRN). It employs a hybrid architecture combining Gated DeltaNet and Gated Attention layers with rotary position embeddings.

Key capabilities

  • Multi-Token Prediction (MTP): Enables 1.4–2.2x faster inference with no accuracy loss when used with compatible inference engines.
  • Thinking Preservation: Retains reasoning context from historical messages, reducing overhead in iterative development workflows.
  • Agentic coding: Improved frontend workflow handling and repository-level reasoning.
  • Tool calling: Enhanced parsing of nested objects for more reliable function calling.

Deployment via Gigarouter

Gigarouter hosts this model as a managed, OpenAI-compatible API. Below are memory requirements for available GGUF quantizations (Unsloth Dynamic 2.0 calibration dataset with over 1.5M hand-curated tokens):

QuantizationMinimum VRAM
3-bit15 GB
4-bit18 GB
6-bit24 GB
8-bit30 GB
BF1655 GB

Recommended sampling parameters

  • Thinking mode (general): temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
  • Thinking mode (coding): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
  • Instruct (non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Qwen3.6-27B model overview graphic Benchmark results for Qwen3.6

The model is available as an optimized GGUF through Gigarouter’s hosted API, eliminating the need for local infrastructure management.

best for

FAQ

What is the context length of Qwen3.6 27B?

It supports 262,144 tokens natively, extendable up to 1,010,000 tokens.

How does MTP (Multi-Token Prediction) improve inference speed?

MTP enables approximately 1.5-2x faster inference with no accuracy loss.

What are the recommended sampling parameters for coding tasks?

For thinking mode coding: temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0.

How can I call Qwen3.6 27B via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending chat completion requests to the hosted model.

What is the memory requirement for running a 4-bit GGUF quant of this model?

A 4-bit quant requires approximately 18 GB of memory.

not yet live

We're benchmarking and onboarding Qwen3.6 27B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →