Gemma 4 26B A4B
cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit
published Apr 2026 · updated Jul 2026
Gemma 4 26B A4B is a Vision-Language Model (VLM) with a Mixture-of-Experts architecture that processes text and image inputs and generates text output, supporting a 256K token context window and built-in reasoning.
specs
| Task | Vision-Language Model (Text & Image Input, Text Output) |
| Architecture | Mixture-of-Experts (MoE) – 8 active experts out of 128 total plus 1 shared expert |
| Parameters | 25.2B total, 3.8B active |
| Context Length | 256K tokens |
| License | Apache 2.0 |
about this model
cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit is a vision-language model that processes text and image input to generate text output, based on Google DeepMind's Gemma 4 26B A4B Mixture-of-Experts architecture and quantized to 4-bit AWQ for efficient inference through gigarouter's API.
The model employs a MoE design with 25.2B total parameters but only 3.8B active per token, routing through 8 active experts out of 128 total plus one shared expert. This yields inference speeds comparable to a 4B-parameter model while retaining the capacity of a much larger network. It supports a 256K-token context window with hybrid attention (local sliding window interleaved with global attention) for long-context tasks.
Key capabilities
- Multimodal input: Text and images at variable aspect ratios and resolutions; interleaved multimodal prompts.
- Reasoning: Configurable thinking mode for step-by-step reasoning before answering.
- Function calling: Native structured tool use for agentic workflows.
- Coding: Code generation, completion, and correction.
- Multilingual: Supports 35+ languages out of the box, pre-trained on 140+ languages.
Benchmark results
| Benchmark | Score |
|---|---|
| MMLU Pro | 82.6% |
| AIME 2026 (no tools) | 88.3% |
| LiveCodeBench v6 | 77.1% |
| Codeforces ELO | 1718 |
| GPQA Diamond | 82.3% |
| MMMU Pro (vision) | 73.8% |
| MATH-Vision | 82.4% |
| OmniDocBench 1.5 (edit distance, lower is better) | 0.149 |

best for
- ·Document & PDF parsing with OCR and handwriting recognition
- ·Chart, diagram, and UI understanding
- ·Step-by-step reasoning and math problem solving
- ·Code generation and agentic function-calling workflows
FAQ
It supports text and image input (including variable aspect ratios and resolutions).
256K tokens.
Apache 2.0.
Use the gigarouter OpenAI-compatible endpoint with a valid API key.
25.2 billion total parameters, with 3.8 billion active parameters per inference.
We're benchmarking and onboarding Gemma 4 26B A4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.