PP-DocBlockLayout

PaddlePaddle/PP-DocBlockLayout

published Jun 2025 · updated Jul 2025

PP-DocBlockLayout is an image-to-text model that detects layout block regions in document images using the RT-DETR-L architecture.

status

coming soon

API providers

downloads / mo

18.6K

license

apache-2.0

specs

Task	Image-to-text (layout block detection)
Architecture	RT-DETR-L
License	Apache 2.0
mAP (0.5)	95.9%

about this model

PP-DocBlockLayout is a document layout detection model that identifies and localizes region blocks in document images, outputting bounding boxes with confidence scores. It is built on the RT-DETR-L architecture and trained on a self-built dataset covering Chinese and English papers, PPTs, multi-layout magazines, contracts, books, exams, ancient texts, and research reports. The model detects a single class, Region, and produces structured JSON output with coordinates and label information.

Performance

Metric	Value
[email protected]	95.9%

The evaluation set comprises 1,000 document images spanning the same document types used in training.

Integration

Through gigarouter, PP-DocBlockLayout is available as a hosted, OpenAI-compatible API. Developers can send document images and receive detection results without managing infrastructure. The model is released under the Apache 2.0 license and supports CPU, GPU, XPU, and NPU hardware. It is compatible with Python 3.8–3.12 on Linux, Windows, and macOS.

Below is an example visualization of the model’s output on a sample document:

Visualization of PP-DocBlockLayout detection results showing bounding boxes over a document page.

best for

·Detecting layout regions in scanned documents, research papers, and magazines
·Preprocessing document images for OCR or document understanding pipelines
·Analyzing layout structure of contracts, books, and exam papers

FAQ

What is the input format for PP-DocBlockLayout?

The model accepts document images (e.g., PNG, JPG) and outputs bounding boxes with labels and confidence scores for each detected region.

What is the output format?

The output is a JSON object containing a list of detected boxes, each with a label (Region), confidence score, and coordinate array.

What license is PP-DocBlockLayout released under?

It is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image URL or base64-encoded image as input.

What hardware does this model support?

It supports CPU, GPU, XPU, and NPU, and runs on Linux, Windows, and macOS.

not yet live

We're benchmarking and onboarding PP-DocBlockLayout as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo