skip to content
gigarouter gigarouter
models / vision-language · coming soon

Granite Vision 4.1 4B

ibm-granite/granite-vision-4.1-4b

published Apr 2026 · updated May 2026

Granite Vision 4.1 4B is a vision-language model that delivers frontier-level performance on structured document extraction tasks in a compact 4B parameter footprint.

est. price
~$0.626
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
111K
license
apache-2.0

specs

TaskStructured document extraction (charts, tables, key-value pairs)
ArchitectureGranite-4.1-3B LLM (3.4B) + 0.6B Vision Encoder and Projectors
Parameters4B
LicenseApache 2.0

about this model

Granite Vision 4.1 4B is a vision-language model (VLM) that delivers frontier-level performance on structured document extraction tasks — chart extraction, table extraction, and semantic key-value pair extraction — in a compact 4B parameter footprint. It is finetuned on top of Granite-4.1-3B, with a 3.4B LLM and 0.6B vision encoder and projectors. The model is developed by IBM Research and released under the Apache 2.0 license.

Supported Extraction Tasks

The model supports specialized extraction tasks activated by simple task tags in the user message, which the chat template automatically expands into full prompts:

  • Chart extraction: <chart2csv> (CSV table), <chart2code> (Python code), <chart2summary> (natural-language description)
  • Table extraction: <tables_json> (structured JSON), <tables_html> (HTML markup), <tables_otsl> (OTSL markup with cell/merge tags)
  • Key-Value Pair (KVP) extraction: Schema-based extraction returning JSON with nested dictionaries and arrays

Benchmark Performance

Granite Vision 4.1 4B provides a lightweight alternative to frontier models on structured document extraction benchmarks, delivering comparable performance at a fraction of the parameter count.

Chart comparing Granite Vision 4.1 4B performance to frontier models on document extraction benchmarks

Chart Extraction

Evaluated on the human-verified test set from ChartNet (1.5 million chart samples, 24 chart types, 6 plotting libraries), using LLM-as-a-judge (GPT-4o) scoring on Chart2CSV and Chart2Summary tasks.

Chart2CSV benchmark scores comparing Granite Vision 4.1 4B to other models Chart2Summary benchmark scores comparing Granite Vision 4.1 4B to other models

Table Extraction

Evaluated on a unified suite spanning TableVQA-Extract, OmniDocBench-tables, and PubTablesV2, using TEDS (Tree-Edit Distance-based Similarity) for structural and content similarity. Results are reported separately for cropped-table and full-page settings.

Table extraction TEDS scores on cropped-table benchmarks Table extraction TEDS scores on full-page benchmarks Additional table extraction benchmark results

Key-Value Pair Extraction

On the VAREX benchmark (1,777 U.S. government forms, 21,084 evaluation fields), Granite Vision 4.1 4B achieves 94.2% exact-match accuracy (zero-shot, image modality), competitive with much larger frontier models.

Methodology

The model is trained on ChartNet, a million-scale multimodal dataset described in the paper ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding (accepted at CVPR 2026). ChartNet uses a code-guided synthesis pipeline to generate 1.5 million chart samples with five aligned components: plotting code, rendered chart image, data table, natural language summary, and QA with reasoning. The dataset includes specialized subsets for human-annotated data, real-world data, safety, and grounding, with a rigorous quality-filtering pipeline ensuring visual fidelity and semantic accuracy.

The model integrates seamlessly with Docling for enhanced document processing pipelines with deep visual understanding capabilities.

best for

FAQ

What tasks does Granite Vision 4.1 4B support?

It supports chart extraction (CSV, code, summary), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction, all via simple task tags.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with an image and a task tag prompt.

What is the license for Granite Vision 4.1 4B?

It is released under Apache 2.0, allowing free use, modification, and distribution.

How does this 4B parameter model compare to larger frontier models?

It delivers comparable performance on structured document extraction benchmarks while being much smaller and faster, making it a lightweight alternative.

Can Granite Vision 4.1 4B be integrated with Docling?

Yes, it integrates seamlessly with Docling to enhance document processing pipelines with deep visual understanding capabilities.

not yet live

We're benchmarking and onboarding Granite Vision 4.1 4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →