skip to content
gigarouter gigarouter
models / object detection · coming soon

YOLO11 Document Layout

Armaggheddon/yolo11-document-layout

published Sep 2025 · updated Mar 2026

YOLO11 Document Layout is a detection model that identifies and classifies 11 document layout elements (text, tables, figures, titles, etc.) using the YOLOv11 architecture fine-tuned on the DocLayNet dataset.

status
coming soon
API providers
0
downloads / mo
4.2K
license
mit

specs

TaskDocument Layout Analysis (Object Detection)
ArchitectureYOLOv11 (Nano, Small, Medium variants)
ParametersNano: 2.6M, Small: 9.4M, Medium: 20.1M
Training DatasetDocLayNet (11 classes)
LicenseMIT

about this model

Armaggheddon/yolo11-document-layout is a detection model fine-tuned for document layout analysis, trained on the DocLayNet v1.2 dataset. It detects and classifies 11 layout elements (text, title, section-header, table, picture, caption, list-item, formula, page-header, page-footer, footnote) in document images. Three variants are available: nano, small, and medium, all trained at 1280x1280 resolution. The DocLayNet dataset comprises 80,863 unique pages from six document categories, including financial reports, scientific articles, and patents.

Performance Benchmarks

VariantParametersmAP50-95mAP50
Nano2.6M0.7320.841
Small9.4M0.7710.871
Medium20.1M0.7960.887
The following charts compare per-class performance across all three variants.

mAP@50-95 comparison per layout class for nano, small, medium modelsmAP@50 comparison per layout classPrecision comparison per layout classRecall comparison per layout class

The nano variant (train4) is recommended for production use, offering the best speed-accuracy trade-off. It achieves superior localization quality with a +9.0% precision improvement for the title class and +2.4% higher mAP50 for the difficult footnote class compared to other nano candidates. Its higher bounding box precision makes it suitable for applications requiring accurate object boundaries.

best for

FAQ

What are the available model sizes and their trade-offs?

Three sizes: nano (2.6M params, fastest), small (9.4M), medium (20.1M, highest accuracy). The nano is recommended for production due to best speed-accuracy balance.

What is the license of this model?

MIT license.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to gigarouter documentation for endpoint format.

What document layout classes does it detect?

11 classes: Text, Title, Section-header, Table, Picture, Caption, List-item, Formula, Page-header, Page-footer, Footnote.

What input format does the model expect?

Images at 1280x1280 resolution, preferably PNG or JPEG. The model outputs bounding boxes with class labels.

not yet live

We're benchmarking and onboarding YOLO11 Document Layout as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →