PP-OCRv6 Medium Det
PaddlePaddle/PP-OCRv6_medium_det
published Jun 2026 · updated Jun 2026
PP-OCRv6 Medium Det is an image-to-text model that detects and localizes text regions in images using a lightweight MetaFormer-based architecture.
specs
| Task | Text Detection |
| Architecture | MetaFormer (LCNetV4 backbone + RepLKFPN neck) |
| Parameters | 15.5M |
| License | Apache 2.0 |
about this model
PP-OCRv6_medium_det is a lightweight text detection model from the PP-OCRv6 family, designed for accurate and efficient localization of text in images. It is built on a unified MetaFormer-style architecture with a LCNetV4 backbone and RepLKFPN feature pyramid neck, using structural reparameterization to decouple spatial and channel mixing. With 15.5 million parameters, it targets server and cloud deployment scenarios.
Benchmark Performance
On the PP-OCRv6 in-house benchmark, the medium tier achieves an average detection Hmean of 86.2%, surpassing PP-OCRv5_server (81.6%) by +4.6% and PP-OCRv5_mobile (75.2%) by +11.0%. The model dramatically outperforms billion-scale vision-language models on the same detection benchmark: Gemini-3.1-Pro (46.8%), GPT-5.5 (45.6%), Qwen3-VL-235B (38.3%), Kimi-K2.6 (12.8%), and MiniMax-M3 (12.0%). Recognition accuracy for the full OCR system is 83.2%.
Per-Category Detection Hmean
| Category | Hmean (%) |
|---|---|
| Handwritten CN | 83.7 |
| Handwritten EN | 84.0 |
| Printed CN | 95.1 |
| Printed EN | 93.7 |
| Traditional Chinese | 86.3 |
| Ancient Text | 80.2 |
| Japanese | 84.3 |
| Blur | 94.1 |
| Emoji | 99.6 |
| Warp | 88.6 |
| Pinyin | 74.0 |
| Artistic | 69.0 |
| Table | 96.8 |
| Rotation | 93.8 |
| Industrial | 73.3 |
| General | 82.8 |
Architecture and Deployment
The model shares block primitives across medium, small, and tiny tiers, with task-specific stride configurations. The tiny tier achieves 3.9× faster inference than PP-OCRv5_mobile on Intel Xeon CPU while maintaining comparable accuracy. The medium_det variant is licensed under Apache 2.0. A detailed paper is available at arXiv:2606.13108.
best for
- ·Extracting text from scanned documents
- ·Detecting text in natural scene images
- ·OCR for multi-lingual documents (Chinese, English, Japanese, etc.)
FAQ
It excels at bounding-box-level text detection in images, supporting printed, handwritten, and multi-lingual text across diverse scenarios.
PP-OCRv6 Medium Det (86.2% Hmean) dramatically surpasses Gemini-3.1-Pro (46.8%), GPT-5.5 (45.6%), and Qwen3-VL-235B (38.3%) with only 15.5M parameters.
Input: image (JPEG/PNG). Output: JSON array of bounding boxes with confidence scores and text (if combined with recognition).
Use the OpenAI-compatible endpoint with your API key; pass the image as a base64-encoded string or URL in the request.
It is released under the Apache 2.0 license, allowing commercial use, modification, and distribution with attribution.
We're benchmarking and onboarding PP-OCRv6 Medium Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.