YOLO11 Document Layout

Armaggheddon/yolo11-document-layout

published Sep 2025 · updated Mar 2026

YOLO11 Document Layout is a detection model that identifies and classifies 11 document layout elements (text, tables, figures, titles, etc.) using the YOLOv11 architecture fine-tuned on the DocLayNet dataset.

status

coming soon

API providers

downloads / mo

4.2K

license

mit

specs

Task	Document Layout Analysis (Object Detection)
Architecture	YOLOv11 (Nano, Small, Medium variants)
Parameters	Nano: 2.6M, Small: 9.4M, Medium: 20.1M
Training Dataset	DocLayNet (11 classes)
License	MIT

about this model

Armaggheddon/yolo11-document-layout is a detection model fine-tuned for document layout analysis, trained on the DocLayNet v1.2 dataset. It detects and classifies 11 layout elements (text, title, section-header, table, picture, caption, list-item, formula, page-header, page-footer, footnote) in document images. Three variants are available: nano, small, and medium, all trained at 1280x1280 resolution. The DocLayNet dataset comprises 80,863 unique pages from six document categories, including financial reports, scientific articles, and patents.

Performance Benchmarks

Variant	Parameters	mAP50-95	mAP50
Nano	2.6M	0.732	0.841
Small	9.4M	0.771	0.871
Medium	20.1M	0.796	0.887

The following charts compare per-class performance across all three variants.

mAP@50-95 comparison per layout class for nano, small, medium models mAP@50 comparison per layout class Precision comparison per layout class Recall comparison per layout class

The nano variant (train4) is recommended for production use, offering the best speed-accuracy trade-off. It achieves superior localization quality with a +9.0% precision improvement for the title class and +2.4% higher mAP50 for the difficult footnote class compared to other nano candidates. Its higher bounding box precision makes it suitable for applications requiring accurate object boundaries.

best for

·Automated document digitization and OCR preprocessing
·Intelligent document parsing for information extraction
·Layout-aware document classification and indexing

FAQ

What are the available model sizes and their trade-offs?

Three sizes: nano (2.6M params, fastest), small (9.4M), medium (20.1M, highest accuracy). The nano is recommended for production due to best speed-accuracy balance.

What is the license of this model?

MIT license.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to gigarouter documentation for endpoint format.

What document layout classes does it detect?

11 classes: Text, Title, Section-header, Table, Picture, Caption, List-item, Formula, Page-header, Page-footer, Footnote.

What input format does the model expect?

Images at 1280x1280 resolution, preferably PNG or JPEG. The model outputs bounding boxes with class labels.

not yet live

We're benchmarking and onboarding YOLO11 Document Layout as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo