models / image-to-text · coming soon

UVDoc

PaddlePaddle/UVDoc

published Jun 2025 · updated Jul 2025

UVDoc is an image-to-text model that corrects geometric distortions in document images to improve OCR accuracy.

status

coming soon

API providers

downloads / mo

512.8K

license

apache-2.0

specs

Task	Document Image Unwarping
Architecture	UVDoc
Framework	PaddlePaddle
Benchmark CER (DocUNet)	0.179

about this model

UVDoc is an image-to-text model that performs document image unwarping — correcting geometric distortions such as curling, tilting, and perspective deformation in photographed documents to improve downstream OCR accuracy. It is part of the PaddleOCR toolkit and is hosted by gigarouter as a managed, OpenAI-compatible API. The model is designed to preprocess document images before text recognition, addressing common real-world issues where a document is photographed at an angle or on a curved surface. By applying a learned geometric transformation, UVDoc produces a flat, rectified version of the input image, which can then be passed to an OCR engine for more reliable text extraction. On the DocUNet benchmark dataset, UVDoc achieves a Character Error Rate (CER) of 0.179, indicating strong correction performance. The model is also available as an optional preprocessing module within the PP-StructureV3 pipeline, which combines layout detection, OCR, table recognition, seal recognition, and formula recognition for comprehensive document understanding. Example of a document image before and after unwarping by UVDoc

Example of a document image before and after unwarping by UVDoc

For developers evaluating this model through gigarouter's API, UVDoc provides a single-call solution for document rectification, with no need to manage dependencies or hardware. It is suitable for any application where document images are captured under uncontrolled conditions and require normalization before text extraction.

best for

·Preprocessing distorted document images for OCR pipelines
·Correcting perspective and curling in scanned or photographed documents
·Improving text recognition accuracy in document AI systems

FAQ

What is UVDoc?

UVDoc is an image-to-text model that geometrically corrects distorted document images to reduce character error rate in OCR.

How does UVDoc improve OCR?

It applies geometric transformations to fix inclination, perspective, and curling, making subsequent text recognition more accurate.

What are the input and output formats?

Input: a distorted document image (JPEG/PNG). Output: an unwarped document image, which can be used directly or passed to an OCR engine.

How can I call UVDoc via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model as UVDoc and providing the image URL or base64 data.

What is the reported CER on the DocUNet benchmark?

The CER is 0.179 on the DocUNet benchmark dataset.

not yet live

We're benchmarking and onboarding UVDoc as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo

PP-LCNet_x1_0_doc_ori

445.3K dl/mo