skip to content
gigarouter gigarouter
models / vision-language · coming soon

MinerU 2.5 Pro

opendatalab/MinerU2.5-Pro-2605-1.2B

published May 2026 · updated Jun 2026

MinerU 2.5 Pro is a VLM model that converts PDF pages to Markdown with layout detection, table, formula, and image/chart analysis.

est. price
~$0.235
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
48.8K
license
apache-2.0

specs

TaskDocument Parsing (PDF-to-Markdown)
ArchitectureQwen2VL (1.2B parameters)
Parameters1.2B
LicenseApache-2.0

about this model

MinerU2.5-Pro-2605 is a vision-language model for document parsing that converts PDF pages to structured Markdown, combining layout detection, table and formula recognition, and image/chart analysis in a single 1.2B-parameter architecture.

MinerU logo or banner

Key Strengths

The 2605 version improves upon the previous 2604 release by substantially reducing category misclassification in layout detection—especially for the image_block category—and by enhancing recognition of charts, flowcharts, and seals through a large-scale training dataset. These refinements focus on real-world usability while maintaining competitive benchmark performance.

Benchmark Results

On OmniDocBench v1.6full, MinerU2.5-Pro-2605 achieves an overall score of 95.72. The table below compares it to the prior version:

Model VersionOverall↑Text↓Formula↑Table↑Table↑Read Order↓
MinerU2.5-Pro-260595.720.03697.1593.6296.010.123
MinerU2.5-Pro-260495.690.03697.2993.4295.920.120

Additional modality-specific results include a formula CDM of 97.15 and a text Edit Distance of 0.036, both at state-of-the-art levels.

Performance comparison chart or model illustration

Technical Approach

Performance gains come from data engineering rather than architectural changes. The training corpus was expanded from under 10M to 65.5M pages using Diversity-and-Difficulty-Aware Sampling, and annotation quality was improved via Cross-Model Consistency Verification and a Judge-and-Refine pipeline. A three-stage training strategy (large-scale pre-training, hard-sample fine-tuning, GRPO alignment) maximizes data utility. When served with a vllm-async-engine on an A100, the model achieves an inference speed of 2.12 fps.

best for

FAQ

What input formats does MinerU 2.5 Pro support?

It accepts page images (e.g., PNG) and outputs structured ContentBlocks with bounding boxes, types (text, table, equation, image), and recognized content.

How fast is inference?

Using the vllm-async-engine on an A100, it achieves 2.12 frames per second (concurrent inference speed).

What license is this model released under?

Apache-2.0.

How does it compare to larger models?

It achieves SOTA on OmniDocBench v1.6 with a 95.69 overall score, outperforming models with over 200x more parameters.

How can I use it via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key and send a page image for parsing.

not yet live

We're benchmarking and onboarding MinerU 2.5 Pro as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →