PP-OCRv4 Mobile Det

PaddlePaddle/PP-OCRv4_mobile_det

published Jun 2025 · updated Jul 2025

PP-OCRv4 Mobile Det is an image-to-text model that detects text regions in images, outputting bounding polygons with confidence scores, optimized for mobile and edge devices.

status

coming soon

API providers

downloads / mo

20.1K

license

apache-2.0

specs

Task	Text Detection
Architecture	Mobile-optimized convolutional neural network
License	Apache 2.0
Accuracy (Average)	0.624
Output Format	Polygons and confidence scores

about this model

PP-OCRv4_mobile_det is a text detection model that localizes text regions in images, optimized for mobile and edge deployment. Developed by the PaddleOCR team as part of the PP-OCRv4_det series, it balances high accuracy with efficient inference, making it suitable for resource-constrained environments. The model outputs polygon coordinates and confidence scores for detected text boxes.

Benchmark Performance

The model's accuracy (measured as F-score) across diverse text scenarios is as follows:

Scenario	H-Chinese	H-English	P-Chinese	P-English	T-Chinese	Ancient	Japanese	General	Pinyin	Rotation	Distortion	Artistic	Average
F-score	0.583	0.369	0.872	0.773	0.663	0.231	0.634	0.710	0.430	0.299	0.715	0.549	0.624

H = Handwritten, P = Printed, T = Traditional. Average F-score across all scenarios is 0.624.

Output Example

The following image shows a typical detection result with detected polygons overlaid on the input:

Example of text detection output with bounding polygons

PP-OCRv4_mobile_det is licensed under Apache 2.0 and is available as a hosted API on gigarouter, requiring no local installation. It integrates seamlessly with the PP-OCRv4 recognition pipeline for end-to-end OCR workflows.

best for

·Real-time OCR on mobile devices
·Document scanning and digitization
·Extracting text from images on edge devices

FAQ

What is the primary use of PP-OCRv4 Mobile Det?

It detects text regions in images, typically used as part of an OCR pipeline to extract text.

What license does this model use?

Apache 2.0.

What are the input and output formats?

Input: image (URL or file). Output: polygon coordinates and confidence scores for each detected text region.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model name "PP-OCRv4 Mobile Det".

Is this model suitable for real-time applications?

Yes, it is mobile-optimized for efficient inference on edge devices.

not yet live

We're benchmarking and onboarding PP-OCRv4 Mobile Det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo