Granite Vision 4.1 4B
ibm-granite/granite-vision-4.1-4b
published Apr 2026 · updated May 2026
Granite Vision 4.1 4B is a vision-language model that delivers frontier-level performance on structured document extraction tasks in a compact 4B parameter footprint.
specs
| Task | Structured document extraction (charts, tables, key-value pairs) |
| Architecture | Granite-4.1-3B LLM (3.4B) + 0.6B Vision Encoder and Projectors |
| Parameters | 4B |
| License | Apache 2.0 |
about this model
Granite Vision 4.1 4B is a vision-language model (VLM) that delivers frontier-level performance on structured document extraction tasks — chart extraction, table extraction, and semantic key-value pair extraction — in a compact 4B parameter footprint. It is finetuned on top of Granite-4.1-3B, with a 3.4B LLM and 0.6B vision encoder and projectors. The model is developed by IBM Research and released under the Apache 2.0 license.
Supported Extraction Tasks
The model supports specialized extraction tasks activated by simple task tags in the user message, which the chat template automatically expands into full prompts:
- Chart extraction:
<chart2csv>(CSV table),<chart2code> (Python code),<chart2summary> (natural-language description) - Table extraction:
<tables_json>(structured JSON),<tables_html>(HTML markup),<tables_otsl>(OTSL markup with cell/merge tags) - Key-Value Pair (KVP) extraction: Schema-based extraction returning JSON with nested dictionaries and arrays
Benchmark Performance
Granite Vision 4.1 4B provides a lightweight alternative to frontier models on structured document extraction benchmarks, delivering comparable performance at a fraction of the parameter count.
Chart Extraction
Evaluated on the human-verified test set from ChartNet (1.5 million chart samples, 24 chart types, 6 plotting libraries), using LLM-as-a-judge (GPT-4o) scoring on Chart2CSV and Chart2Summary tasks.
Table Extraction
Evaluated on a unified suite spanning TableVQA-Extract, OmniDocBench-tables, and PubTablesV2, using TEDS (Tree-Edit Distance-based Similarity) for structural and content similarity. Results are reported separately for cropped-table and full-page settings.
Key-Value Pair Extraction
On the VAREX benchmark (1,777 U.S. government forms, 21,084 evaluation fields), Granite Vision 4.1 4B achieves 94.2% exact-match accuracy (zero-shot, image modality), competitive with much larger frontier models.
Methodology
The model is trained on ChartNet, a million-scale multimodal dataset described in the paper ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding (accepted at CVPR 2026). ChartNet uses a code-guided synthesis pipeline to generate 1.5 million chart samples with five aligned components: plotting code, rendered chart image, data table, natural language summary, and QA with reasoning. The dataset includes specialized subsets for human-annotated data, real-world data, safety, and grounding, with a rigorous quality-filtering pipeline ensuring visual fidelity and semantic accuracy.
The model integrates seamlessly with Docling for enhanced document processing pipelines with deep visual understanding capabilities.
best for
- ·Converting charts to CSV, Python code, or natural-language summaries
- ·Extracting tables from document images into JSON, HTML, or OTSL
- ·Extracting semantic key-value pairs (e.g., invoice fields) using a JSON schema
FAQ
It supports chart extraction (CSV, code, summary), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction, all via simple task tags.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with an image and a task tag prompt.
It is released under Apache 2.0, allowing free use, modification, and distribution.
It delivers comparable performance on structured document extraction benchmarks while being much smaller and faster, making it a lightweight alternative.
Yes, it integrates seamlessly with Docling to enhance document processing pipelines with deep visual understanding capabilities.
We're benchmarking and onboarding Granite Vision 4.1 4B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.