skip to content
gigarouter gigarouter
models / image-to-text · coming soon

PP-OCRv5 Server Det

PaddlePaddle/PP-OCRv5_server_det

published Jun 2025 · updated Jul 2025

PP-OCRv5 Server Det is a text detection model that identifies text regions in images, supporting multiple languages and complex layouts including handwriting, vertical, rotated, and curved text.

status
coming soon
API providers
0
downloads / mo
587.3K
license
apache-2.0

specs

TaskText Detection
ArchitecturePP-OCRv5 Detection Network
Supported LanguagesSimplified Chinese, Traditional Chinese, English, Japanese

about this model

PP-OCRv5_server_det is a text detection model from the PaddleOCR team, designed for high-performance detection of text in images. It supports multiple languages including Simplified Chinese, Traditional Chinese, English, and Japanese, and can handle challenging scenarios such as handwriting, vertical text, rotated text, curved text, and complex layouts. The model is optimized for applications like document analysis, license plate recognition, and scene text detection.

Accuracy is evaluated across 12 diverse categories, with an overall average of 0.827 (higher is better). Detailed per‑category detection accuracy is shown below:

Handwritten Chinese Handwritten English Printed Chinese Printed English Traditional Chinese Ancient Text Japanese General Scenario Pinyin Rotation Distortion Artistic Text Average
0.803 0.841 0.945 0.917 0.815 0.676 0.772 0.797 0.671 0.800 0.876 0.673 0.827

The model can be used standalone for text detection or composed into a full OCR pipeline (PP‑OCRv5) that includes text recognition, optional image orientation classification, and text line orientation modules. A sample detection result is visualized below:

Visualized text detection output showing detected polygon bounding boxes around text regions in a document

As a hosted API on gigarouter, PP-OCRv5_server_det provides an OpenAI‑compatible endpoint for image‑to‑text workflows, requiring no local installation or model management.

best for

FAQ

What is the input format for the model?

The model accepts an image as input, typically in PNG or JPEG format.

What is the output format?

The output is a set of detection polygons (coordinates of bounding boxes) with confidence scores for each detected text region.

What languages does PP-OCRv5 Server Det support?

It supports Simplified Chinese, Traditional Chinese, English, and Japanese text detection.

What is the average detection accuracy of this model?

According to the model card, the average accuracy across all tested scenarios is 0.827.

How can I use this model via the gigarouter API?

You can call the model via the gigarouter OpenAI-compatible endpoint using an API key. Send an image and receive detection results in the response.

not yet live

We're benchmarking and onboarding PP-OCRv5 Server Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →