Gemma 4 26B A4B QAT Uncensored Balanced
HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP
published Jun 2026 · updated Jun 2026
Gemma 4 26B A4B QAT Uncensored Balanced is a vision-language model that provides unrestricted responses with no refusals, based on Google's Gemma 4, optimized for agentic coding, reasoning, and creative writing, with a multi-token-prediction draft head for faster generation.
specs
| Task | Vision-Language (VLM) - image and text input to text output |
| Architecture | Mixture of Experts (MoE) - 128 experts, 8 active per token, plus 1 shared expert |
| Parameters | 25.2B total, 3.8B active |
| Context Length | 262,144 tokens (256K) |
| License | Apache 2.0 |
about this model
HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP is a vision-language model based on Google DeepMind's Gemma 4 26B-A4B, fine-tuned to eliminate refusals while preserving full functionality for agentic coding, reasoning, and creative writing tasks. It achieved 0/465 refusals in automated and manual testing, with no obstructive deflections reported in standard use. Built from official quantization-aware training (QAT) weights, the 4-bit (Q4_K_M) quantization retains near‑full‑precision quality.
Performance and Architecture
The model incorporates an MTP (multi‑token prediction) speculative decoding draft head, delivering approximately 35% faster generation with identical output quality. Architecture includes a mixture of experts with 25.2B total parameters (3.8B active per token), consisting of 128 experts (8 active plus 1 shared expert). It uses hybrid attention combining a 1024‑token sliding window with global attention using proportional RoPE, across 30 layers and a 256K context window. The vision encoder adds approximately 550M parameters for image input via an mmproj module. Multilingual support covers over 140 languages, and the model natively supports the system role and configurable thinking modes.
Sampling Configuration
Recommended sampling parameters tuned for this build: temperature 0.6, top_k 64, top_p 0.9, min_p 0.05, and repeat_penalty 1.1.
best for
- ·Agentic coding assistants with reasoning
- ·Creative writing and long-form content generation
- ·Vision tasks such as image captioning and visual Q&A
- ·Reliability-critical applications requiring no refusals
FAQ
This model is uncensored with 0/465 refusals while retaining all original capabilities, unlike the base Gemma 4 which may refuse certain prompts.
The MTP (multi-token-prediction) draft head enables speculative decoding, achieving roughly 35% faster generation with identical output quality because the model verifies every drafted token.
The model accepts text and optional image inputs. Use the gigarouter OpenAI-compatible API endpoint with an API key; images are passed via the vision mmproj component.
The model is licensed under Apache 2.0, as per the original Gemma 4 license from Google DeepMind.
Temperature 0.6, top_k 64, top_p 0.9, min_p 0.05, repeat_penalty 1.1 – these are tuned specifically for this HauhauCS build.
We're benchmarking and onboarding Gemma 4 26B A4B QAT Uncensored Balanced as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.