skip to content
gigarouter gigarouter
models / vision-language · coming soon

Gemma 4 26B A4B

unsloth/gemma-4-26B-A4B-it-GGUF

published Apr 2026 · updated Jun 2026

Gemma 4 26B A4B is a multimodal vision-language model that processes text and image input and generates text output, using a Mixture-of-Experts architecture with 3.8B active parameters.

status
coming soon
API providers
0
downloads / mo
1.5M
license
apache-2.0

specs

TaskVisual Language Model (text + image input, text output)
ArchitectureMixture-of-Experts (MoE) with hybrid attention
Parameters25.2B total / 3.8B active
LicenseApache 2.0

about this model

Gemma 4 26B A4B IT (GGUF) is a vision-language model (VLM) that processes text and image inputs and generates text output. It is the instruction-tuned, Mixture-of-Experts variant of Google DeepMind's Gemma 4 family, quantized to GGUF format by Unsloth for efficient deployment.

Architecture

The model uses 25.2B total parameters with only 3.8B active per token, achieved via 8 active experts out of 128 total plus 1 shared expert. This design enables inference speeds comparable to a 4B-parameter model while retaining the capability of a much larger model. It supports a context window of 256K tokens and a vocabulary of 262K tokens. The vision encoder adds approximately 550M parameters. A hybrid attention mechanism interleaves local sliding window attention (1024 tokens) with full global attention, and the final layer is always global.

Benchmark Results

On the instruction-tuned variant, as reported by Google DeepMind:

BenchmarkScore
MMLU Pro82.6%
AIME 2026 (no tools)88.3%
LiveCodeBench v677.1%
Codeforces ELO1718
GPQA Diamond82.3%
MMMU (Multimodal)86.3%
MMMU Pro (Vision)73.8%
MATH-Vision82.4%
OmniDocBench 1.5 (edit distance, lower is better)0.149
MRCR v2 8 needle 128k (long context)44.1%

Key Capabilities

  • Multimodal understanding: Accepts interleaved text and images; supports variable aspect ratios and resolutions. Capable of OCR, document parsing, chart comprehension, and handwriting recognition.
  • Reasoning: Built-in configurable thinking mode for step-by-step reasoning before answering.
  • Coding & agentic tasks: Native function-calling support; strong results on coding benchmarks (LiveCodeBench, Codeforces).
  • Long context: 256K token context window with efficient memory handling via unified KV and proportional RoPE.
  • Multilingual: Pre-trained on 140+ languages; supports 35+ languages out of the box.

best for

FAQ

What makes Gemma 4 26B A4B different from a dense model?

It uses a Mixture-of-Experts architecture with 128 total experts, activating only 8 per token, so it runs almost as fast as a 4B-parameter model while having 25.2B total parameters.

What input modalities does it support?

It accepts text and image inputs (including documents, charts, screenshots) and outputs text. It does not support audio input.

What is the context window length?

It supports up to 256K tokens of context.

What license is Gemma 4 26B A4B released under?

It is released under the Apache 2.0 license.

How can I call this model via API?

Use the gigarouter OpenAI-compatible endpoint with your API key; it supports chat completions and function calling.

not yet live

We're benchmarking and onboarding Gemma 4 26B A4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →