Infinity Parser 7B

infly/Infinity-Parser-7B

published Oct 2025 · updated Feb 2026

Infinity Parser 7B is a vision-language model that parses scanned documents into structured text using reinforcement learning with layout-aware rewards.

status

coming soon

API providers

downloads / mo

4.2K

specs

Task	Scanned document parsing (OCR, table and formula extraction, reading order detection)
Architecture	Vision-Language Model (VLM) based on Qwen2.5-VL-7B
Parameters	7B
License	Apache 2.0

about this model

Infinity-Parser-7B is a vision-language model for end-to-end scanned document parsing, trained with reinforcement learning to preserve layout and content fidelity.

The model uses the LayoutRL framework, optimizing a composite reward of normalized edit distance, paragraph count accuracy, and reading order preservation. It was trained on Infinity-Doc-400K, a dataset of 400K scanned documents with rich layout variations and structural annotations.

On benchmarks including OmniDocBench, olmOCR-Bench, PubTabNet, and FinTabNet, Infinity-Parser-7B achieves state-of-the-art performance across diverse document types, languages (English and Chinese), and structural complexities. It substantially outperforms both specialized document parsing pipelines and general-purpose vision-language models while retaining near-equivalent general multimodal understanding (base model: Qwen2.5-VL-7B, evaluated via LMMS-Eval).

Architecture

The training framework combines reinforcement fine-tuning with edit distance, layout, and order-based rewards.

Benchmark Results

Table recognition performance on PubTabNet and FinTabNet

General multimodal capability evaluation compared to Qwen2.5-VL-7B

Visual Comparison

Example comparisons showing Infinity-Parser output vs. other methods

Limitations

Does not provide layout or bounding box information, limiting support for downstream structured reconstruction.
Lacks perception and structured extraction for charts and figures.

The model was accepted as a Findings paper at ACL 2026. Licensed under Apache 2.0. For further details, see the arXiv paper and GitHub repository.

best for

·Extracting text and tables from scanned documents with high structural fidelity
·Parsing multi-lingual documents (English and Chinese) for downstream data processing
·Reading order detection and formula extraction from academic papers and reports

FAQ

What is the main task of Infinity Parser 7B?

It is a vision-language model for end-to-end scanned document parsing, including OCR, table and formula extraction, and reading order detection.

What base model is Infinity Parser 7B built on?

It is built on Qwen2.5-VL-7B.

What is the license for Infinity Parser 7B?

It is licensed under Apache 2.0.

Does Infinity Parser 7B output bounding boxes or layout information?

No, the current model does not provide layout or bounding box information.

How can I call Infinity Parser 7B via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key.

not yet live

We're benchmarking and onboarding Infinity Parser 7B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →

electra-base-discriminator

wespeaker-voxceleb-resnet34-LM

6.8M dl/mo

unidepth-v2-vitl14

6.3M dl/mo