Gemma 4 26B A4B QAT Uncensored Balanced

HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP

published Jun 2026 · updated Jun 2026

Gemma 4 26B A4B QAT Uncensored Balanced is a vision-language model that provides unrestricted responses with no refusals, based on Google's Gemma 4, optimized for agentic coding, reasoning, and creative writing, with a multi-token-prediction draft head for faster generation.

status

coming soon

API providers

downloads / mo

44.5K

license

gemma

specs

Task	Vision-Language (VLM) - image and text input to text output
Architecture	Mixture of Experts (MoE) - 128 experts, 8 active per token, plus 1 shared expert
Parameters	25.2B total, 3.8B active
Context Length	262,144 tokens (256K)
License	Apache 2.0

about this model

HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP is a vision-language model based on Google DeepMind's Gemma 4 26B-A4B, fine-tuned to eliminate refusals while preserving full functionality for agentic coding, reasoning, and creative writing tasks. It achieved 0/465 refusals in automated and manual testing, with no obstructive deflections reported in standard use. Built from official quantization-aware training (QAT) weights, the 4-bit (Q4_K_M) quantization retains near‑full‑precision quality.

Performance and Architecture

The model incorporates an MTP (multi‑token prediction) speculative decoding draft head, delivering approximately 35% faster generation with identical output quality. Architecture includes a mixture of experts with 25.2B total parameters (3.8B active per token), consisting of 128 experts (8 active plus 1 shared expert). It uses hybrid attention combining a 1024‑token sliding window with global attention using proportional RoPE, across 30 layers and a 256K context window. The vision encoder adds approximately 550M parameters for image input via an mmproj module. Multilingual support covers over 140 languages, and the model natively supports the system role and configurable thinking modes.

Sampling Configuration

Recommended sampling parameters tuned for this build: temperature 0.6, top_k 64, top_p 0.9, min_p 0.05, and repeat_penalty 1.1.

best for

·Agentic coding assistants with reasoning
·Creative writing and long-form content generation
·Vision tasks such as image captioning and visual Q&A
·Reliability-critical applications requiring no refusals

FAQ

What is the primary difference between this model and the original Gemma 4?

This model is uncensored with 0/465 refusals while retaining all original capabilities, unlike the base Gemma 4 which may refuse certain prompts.

How does the MTP head speed up generation?

The MTP (multi-token-prediction) draft head enables speculative decoding, achieving roughly 35% faster generation with identical output quality because the model verifies every drafted token.

What input format does the model accept?

The model accepts text and optional image inputs. Use the gigarouter OpenAI-compatible API endpoint with an API key; images are passed via the vision mmproj component.

What license applies to this model?

The model is licensed under Apache 2.0, as per the original Gemma 4 license from Google DeepMind.

What are the recommended sampling parameters?

Temperature 0.6, top_k 64, top_p 0.9, min_p 0.05, repeat_penalty 1.1 – these are tuned specifically for this HauhauCS build.

not yet live

We're benchmarking and onboarding Gemma 4 26B A4B QAT Uncensored Balanced as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit