PP-OCRv6 Medium Rec

PaddlePaddle/PP-OCRv6_medium_rec

published Jun 2026 · updated Jun 2026

PP-OCRv6 Medium Rec is a lightweight image-to-text model for optical character recognition (OCR) that achieves 83.2% recognition accuracy with only 19M parameters.

status

coming soon

API providers

downloads / mo

79.9K

license

apache-2.0

specs

Task	image-to-text (OCR recognition)
Architecture	MetaFormer-style backbone with structural reparameterization, decoupled spatial and channel mixing
Parameters	19M
License	Apache 2.0

about this model

PP-OCRv6_medium_rec is a lightweight image-to-text recognition model (19M parameters) that converts printed or handwritten text from images into machine-readable strings, designed for server-side OCR deployment.

Built on a unified MetaFormer-style backbone with structural reparameterization, it decouples spatial token mixing from channel mixing. The model achieves 83.2% weighted-average recognition accuracy on in-house benchmarks, outperforming the previous PP-OCRv5_server by +5.1% and surpassing billion-scale vision-language models such as Qwen3-VL-235B, GPT-5.5, and Gemini-3.1-Pro on dedicated OCR tasks—despite orders of magnitude fewer parameters.

Per-category recognition accuracy (PP-OCRv6_medium) on in-house test sets:

Category	Accuracy (%)
Handwritten CN	62.1
Handwritten EN	67.8
Printed CN	91.5
Printed EN	94.1
TC	78.6
Ancient	72.4
JP	90.5
Confusable	64.9
Special	61.7
General	87.5
Pinyin	78.1
Artistic	71.2
Industrial	77.4
Screen	82.5
Card	88.1

Hosted on gigarouter as a managed API, the model is available via an OpenAI-compatible endpoint—no infrastructure setup required.

best for

·Extracting text from scanned documents
·Recognizing handwritten text in forms
·Real-time OCR on mobile or edge devices

FAQ

What is PP-OCRv6 Medium Rec best used for?

It is best for high-accuracy OCR on printed and handwritten text in images, especially when low latency or small model size is required.

How does it compare in size and speed to other models?

With 19M parameters, it is much smaller than billion-scale VLMs yet surpasses them on OCR benchmarks. Its tiny tier runs 3.9x faster than PP-OCRv5_mobile on CPU.

What are the license terms?

It is released under the Apache 2.0 license, allowing free use, modification, and distribution.

What input and output format does it expect?

Input: an image containing a single text line (e.g., cropped from a detector). Output: the recognized text string.

How can I call it via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Send an image URL or base64-encoded image in the request.

not yet live

We're benchmarking and onboarding PP-OCRv6 Medium Rec as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo