PP-OCRv5 Server Det
PaddlePaddle/PP-OCRv5_server_det
published Jun 2025 · updated Jul 2025
PP-OCRv5 Server Det is a text detection model that identifies text regions in images, supporting multiple languages and complex layouts including handwriting, vertical, rotated, and curved text.
specs
| Task | Text Detection |
| Architecture | PP-OCRv5 Detection Network |
| Supported Languages | Simplified Chinese, Traditional Chinese, English, Japanese |
about this model
PP-OCRv5_server_det is a text detection model from the PaddleOCR team, designed for high-performance detection of text in images. It supports multiple languages including Simplified Chinese, Traditional Chinese, English, and Japanese, and can handle challenging scenarios such as handwriting, vertical text, rotated text, curved text, and complex layouts. The model is optimized for applications like document analysis, license plate recognition, and scene text detection.
Accuracy is evaluated across 12 diverse categories, with an overall average of 0.827 (higher is better). Detailed per‑category detection accuracy is shown below:
| Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.803 | 0.841 | 0.945 | 0.917 | 0.815 | 0.676 | 0.772 | 0.797 | 0.671 | 0.800 | 0.876 | 0.673 | 0.827 |
The model can be used standalone for text detection or composed into a full OCR pipeline (PP‑OCRv5) that includes text recognition, optional image orientation classification, and text line orientation modules. A sample detection result is visualized below:
As a hosted API on gigarouter, PP-OCRv5_server_det provides an OpenAI‑compatible endpoint for image‑to‑text workflows, requiring no local installation or model management.
best for
- ·Document text extraction in scanned documents
- ·License plate recognition in traffic scenarios
- ·Scene text detection in natural images with complex backgrounds
FAQ
The model accepts an image as input, typically in PNG or JPEG format.
The output is a set of detection polygons (coordinates of bounding boxes) with confidence scores for each detected text region.
It supports Simplified Chinese, Traditional Chinese, English, and Japanese text detection.
According to the model card, the average accuracy across all tested scenarios is 0.827.
You can call the model via the gigarouter OpenAI-compatible endpoint using an API key. Send an image and receive detection results in the response.
We're benchmarking and onboarding PP-OCRv5 Server Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.