PP-DocBlockLayout
PaddlePaddle/PP-DocBlockLayout
published Jun 2025 · updated Jul 2025
PP-DocBlockLayout is an image-to-text model that detects layout block regions in document images using the RT-DETR-L architecture.
specs
| Task | Image-to-text (layout block detection) |
| Architecture | RT-DETR-L |
| License | Apache 2.0 |
| mAP (0.5) | 95.9% |
about this model
PP-DocBlockLayout is a document layout detection model that identifies and localizes region blocks in document images, outputting bounding boxes with confidence scores. It is built on the RT-DETR-L architecture and trained on a self-built dataset covering Chinese and English papers, PPTs, multi-layout magazines, contracts, books, exams, ancient texts, and research reports. The model detects a single class, Region, and produces structured JSON output with coordinates and label information.
Performance
| Metric | Value |
|---|---|
| [email protected] | 95.9% |
The evaluation set comprises 1,000 document images spanning the same document types used in training.
Integration
Through gigarouter, PP-DocBlockLayout is available as a hosted, OpenAI-compatible API. Developers can send document images and receive detection results without managing infrastructure. The model is released under the Apache 2.0 license and supports CPU, GPU, XPU, and NPU hardware. It is compatible with Python 3.8–3.12 on Linux, Windows, and macOS.
Below is an example visualization of the model’s output on a sample document:

best for
- ·Detecting layout regions in scanned documents, research papers, and magazines
- ·Preprocessing document images for OCR or document understanding pipelines
- ·Analyzing layout structure of contracts, books, and exam papers
FAQ
The model accepts document images (e.g., PNG, JPG) and outputs bounding boxes with labels and confidence scores for each detected region.
The output is a JSON object containing a list of detected boxes, each with a label (Region), confidence score, and coordinate array.
It is released under the Apache 2.0 license.
Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image URL or base64-encoded image as input.
It supports CPU, GPU, XPU, and NPU, and runs on Linux, Windows, and macOS.
We're benchmarking and onboarding PP-DocBlockLayout as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.