PP-OCRv5 Server Det

PaddlePaddle/PP-OCRv5_server_det

published Jun 2025 · updated Jul 2025

PP-OCRv5 Server Det is a text detection model that identifies text regions in images, supporting multiple languages and complex layouts including handwriting, vertical, rotated, and curved text.

status

coming soon

API providers

downloads / mo

587.3K

license

apache-2.0

specs

Task	Text Detection
Architecture	PP-OCRv5 Detection Network
Supported Languages	Simplified Chinese, Traditional Chinese, English, Japanese

about this model

PP-OCRv5_server_det is a text detection model from the PaddleOCR team, designed for high-performance detection of text in images. It supports multiple languages including Simplified Chinese, Traditional Chinese, English, and Japanese, and can handle challenging scenarios such as handwriting, vertical text, rotated text, curved text, and complex layouts. The model is optimized for applications like document analysis, license plate recognition, and scene text detection.

Accuracy is evaluated across 12 diverse categories, with an overall average of 0.827 (higher is better). Detailed per‑category detection accuracy is shown below:

Handwritten Chinese	Handwritten English	Printed Chinese	Printed English	Traditional Chinese	Ancient Text	Japanese	General Scenario	Pinyin	Rotation	Distortion	Artistic Text	Average
0.803	0.841	0.945	0.917	0.815	0.676	0.772	0.797	0.671	0.800	0.876	0.673	0.827

The model can be used standalone for text detection or composed into a full OCR pipeline (PP‑OCRv5) that includes text recognition, optional image orientation classification, and text line orientation modules. A sample detection result is visualized below:

Visualized text detection output showing detected polygon bounding boxes around text regions in a document

As a hosted API on gigarouter, PP-OCRv5_server_det provides an OpenAI‑compatible endpoint for image‑to‑text workflows, requiring no local installation or model management.

best for

·Document text extraction in scanned documents
·License plate recognition in traffic scenarios
·Scene text detection in natural images with complex backgrounds

FAQ

What is the input format for the model?

The model accepts an image as input, typically in PNG or JPEG format.

What is the output format?

The output is a set of detection polygons (coordinates of bounding boxes) with confidence scores for each detected text region.

What languages does PP-OCRv5 Server Det support?

It supports Simplified Chinese, Traditional Chinese, English, and Japanese text detection.

What is the average detection accuracy of this model?

According to the model card, the average accuracy across all tested scenarios is 0.827.

How can I use this model via the gigarouter API?

You can call the model via the gigarouter OpenAI-compatible endpoint using an API key. Send an image and receive detection results in the response.

not yet live

We're benchmarking and onboarding PP-OCRv5 Server Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo

PP-LCNet_x1_0_doc_ori

445.3K dl/mo