skip to content
gigarouter gigarouter
models / image-to-text · coming soon

PP-OCRv6 Medium Det

PaddlePaddle/PP-OCRv6_medium_det

published Jun 2026 · updated Jun 2026

PP-OCRv6 Medium Det is an image-to-text model that detects and localizes text regions in images using a lightweight MetaFormer-based architecture.

status
coming soon
API providers
0
downloads / mo
89K
license
apache-2.0

specs

TaskText Detection
ArchitectureMetaFormer (LCNetV4 backbone + RepLKFPN neck)
Parameters15.5M
LicenseApache 2.0

about this model

PP-OCRv6_medium_det is a lightweight text detection model from the PP-OCRv6 family, designed for accurate and efficient localization of text in images. It is built on a unified MetaFormer-style architecture with a LCNetV4 backbone and RepLKFPN feature pyramid neck, using structural reparameterization to decouple spatial and channel mixing. With 15.5 million parameters, it targets server and cloud deployment scenarios.

Benchmark Performance

On the PP-OCRv6 in-house benchmark, the medium tier achieves an average detection Hmean of 86.2%, surpassing PP-OCRv5_server (81.6%) by +4.6% and PP-OCRv5_mobile (75.2%) by +11.0%. The model dramatically outperforms billion-scale vision-language models on the same detection benchmark: Gemini-3.1-Pro (46.8%), GPT-5.5 (45.6%), Qwen3-VL-235B (38.3%), Kimi-K2.6 (12.8%), and MiniMax-M3 (12.0%). Recognition accuracy for the full OCR system is 83.2%.

Per-Category Detection Hmean

CategoryHmean (%)
Handwritten CN83.7
Handwritten EN84.0
Printed CN95.1
Printed EN93.7
Traditional Chinese86.3
Ancient Text80.2
Japanese84.3
Blur94.1
Emoji99.6
Warp88.6
Pinyin74.0
Artistic69.0
Table96.8
Rotation93.8
Industrial73.3
General82.8

Architecture and Deployment

The model shares block primitives across medium, small, and tiny tiers, with task-specific stride configurations. The tiny tier achieves 3.9× faster inference than PP-OCRv5_mobile on Intel Xeon CPU while maintaining comparable accuracy. The medium_det variant is licensed under Apache 2.0. A detailed paper is available at arXiv:2606.13108.

best for

FAQ

What is PP-OCRv6 Medium Det best used for?

It excels at bounding-box-level text detection in images, supporting printed, handwritten, and multi-lingual text across diverse scenarios.

How does it compare to billion-scale VLMs on detection tasks?

PP-OCRv6 Medium Det (86.2% Hmean) dramatically surpasses Gemini-3.1-Pro (46.8%), GPT-5.5 (45.6%), and Qwen3-VL-235B (38.3%) with only 15.5M parameters.

What are the input and output formats?

Input: image (JPEG/PNG). Output: JSON array of bounding boxes with confidence scores and text (if combined with recognition).

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; pass the image as a base64-encoded string or URL in the request.

What is the license for PP-OCRv6 Medium Det?

It is released under the Apache 2.0 license, allowing commercial use, modification, and distribution with attribution.

not yet live

We're benchmarking and onboarding PP-OCRv6 Medium Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →