Gemma 4 26B A4B

unsloth/gemma-4-26B-A4B-it-GGUF

published Apr 2026 · updated Jun 2026

Gemma 4 26B A4B is a multimodal vision-language model that processes text and image input and generates text output, using a Mixture-of-Experts architecture with 3.8B active parameters.

status

coming soon

API providers

downloads / mo

1.5M

license

apache-2.0

specs

Task	Visual Language Model (text + image input, text output)
Architecture	Mixture-of-Experts (MoE) with hybrid attention
Parameters	25.2B total / 3.8B active
License	Apache 2.0

about this model

Gemma 4 26B A4B IT (GGUF) is a vision-language model (VLM) that processes text and image inputs and generates text output. It is the instruction-tuned, Mixture-of-Experts variant of Google DeepMind's Gemma 4 family, quantized to GGUF format by Unsloth for efficient deployment.

Architecture

The model uses 25.2B total parameters with only 3.8B active per token, achieved via 8 active experts out of 128 total plus 1 shared expert. This design enables inference speeds comparable to a 4B-parameter model while retaining the capability of a much larger model. It supports a context window of 256K tokens and a vocabulary of 262K tokens. The vision encoder adds approximately 550M parameters. A hybrid attention mechanism interleaves local sliding window attention (1024 tokens) with full global attention, and the final layer is always global.

Benchmark Results

On the instruction-tuned variant, as reported by Google DeepMind:

Benchmark	Score
MMLU Pro	82.6%
AIME 2026 (no tools)	88.3%
LiveCodeBench v6	77.1%
Codeforces ELO	1718
GPQA Diamond	82.3%
MMMU (Multimodal)	86.3%
MMMU Pro (Vision)	73.8%
MATH-Vision	82.4%
OmniDocBench 1.5 (edit distance, lower is better)	0.149
MRCR v2 8 needle 128k (long context)	44.1%

Key Capabilities

Multimodal understanding: Accepts interleaved text and images; supports variable aspect ratios and resolutions. Capable of OCR, document parsing, chart comprehension, and handwriting recognition.
Reasoning: Built-in configurable thinking mode for step-by-step reasoning before answering.
Coding & agentic tasks: Native function-calling support; strong results on coding benchmarks (LiveCodeBench, Codeforces).
Long context: 256K token context window with efficient memory handling via unified KV and proportional RoPE.
Multilingual: Pre-trained on 140+ languages; supports 35+ languages out of the box.

best for

·Reasoning and step-by-step thinking tasks
·Coding and agentic workflows with function calling
·Document/PDF parsing and OCR

FAQ

What makes Gemma 4 26B A4B different from a dense model?

It uses a Mixture-of-Experts architecture with 128 total experts, activating only 8 per token, so it runs almost as fast as a 4B-parameter model while having 25.2B total parameters.

What input modalities does it support?

It accepts text and image inputs (including documents, charts, screenshots) and outputs text. It does not support audio input.

What is the context window length?

It supports up to 256K tokens of context.

What license is Gemma 4 26B A4B released under?

It is released under the Apache 2.0 license.

How can I call this model via API?

Use the gigarouter OpenAI-compatible endpoint with your API key; it supports chat completions and function calling.

not yet live

We're benchmarking and onboarding Gemma 4 26B A4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit