Infinity Parser2 Flash

infly/Infinity-Parser2-Flash

published Feb 2026 · updated May 2026

A popular open vision-language model, with 16.6K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.626

/ 1k images · estimated, set at launch

API providers

downloads / mo

16.6K

license

apache-2.0

about this model

Infinity-Parser2-Flash is a vision-language model (VLM) specialized for high-speed document parsing. It extracts layout elements, text, tables, formulas, charts, and chemical structures from images, and also supports document visual question answering (VQA) and general multimodal understanding. The model is built on an upgraded synthetic data engine covering nearly 5 million diverse document samples and a multi-task reinforcement learning framework with joint verification rewards, enabling robust zero-shot performance across real-world business scenarios.

Key Strengths

Inference speed: Delivers 1,624 tokens/sec throughput — a 3.68× speedup over Infinity-Parser-7B — reducing latency and deployment cost.
Document parsing accuracy: Scores 86.0% on olmOCR-Bench and 72.2% on ParseBench, outperforming PaddleOCR-VL-1.5, DeepSeek-OCR-2, and MinerU-2.5.
Element parsing: Achieves 96.5% on UniMERNet and 92.41% on PubTabNet. On OmniDocBench-v1.6, the model scores 91.98%.
Document VQA: Reaches 93.16% on DocVQA and 75.94% on InfoVQA.
General understanding: Scores 81.60% on OCRBench and 77.92% on MMBench-EN.

Full benchmark comparisons with leading models are shown below.

Performance comparison chart for Infinity-Parser2-Pro and Flash across document parsing benchmarks.

Inference throughput comparison showing Flash's 3.68x speedup.

Benchmark	Infinity-Parser2-Flash
olmOCR-Bench	86.0
ParseBench	72.2
OmniDocBench-v1.6	91.98
PubTabNet (val)	92.41
UniMERNet	96.5
DocVQA (val)	93.16
OCRBench	81.60

Infinity-Parser2-Flash is hosted on Gigarouter as a managed, OpenAI-compatible API — no local installation required.

not yet live

We're benchmarking and onboarding Infinity Parser2 Flash as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit