Table Transformer Structure Recognition
microsoft/table-transformer-structure-recognition
published Oct 2022 · updated Sep 2023
Table Transformer Structure Recognition is a detection model for identifying table structures (rows, columns, cells) in images, based on the DETR architecture.
specs
| Task | Table Structure Recognition |
| Architecture | DETR (Transformer-based object detection with pre-norm) |
| Dataset | PubTables-1M |
about this model
Microsoft Table Transformer Structure Recognition is an object detection model based on DETR (Detection Transformer) that identifies and localizes structural elements within tables — such as rows, columns, and spanning cells — from image input. It was introduced in the CVPR 2022 paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents and is hosted on Gigarouter as a managed, OpenAI-compatible API.
Key Capabilities
The model predicts bounding boxes and class labels for table components without requiring task-specific customization. It uses a “normalize before” variant of DETR, applying layer normalization before self‑ and cross‑attention. The raw model output can be combined with separate OCR or PDF text extraction to produce structured formats such as HTML or CSV.
Training and Dataset
Fine-tuned on the PubTables-1M dataset, which contains 947,642 fully annotated tables for structure recognition and 575,305 document pages for table detection. The dataset was canonicalized to eliminate oversegmentation inconsistencies found in earlier corpora, leading to more reliable training and evaluation.
Performance and Publication
Accepted at CVPR 2022. The authors report that transformer‑based models trained on PubTables-1M deliver excellent results for table structure recognition, detection, and functional analysis. Pre‑trained weights for this structure‑recognition model were released in May 2022; three updated TATR-v1.1 variants (trained on PubTables-1M, FinTabNet.c, and a combined set) became available in August 2023.
best for
- ·Extracting row and column boundaries from table images in scientific papers
- ·Automating table-to-HTML or table-to-CSV conversion by combining with OCR
FAQ
It outputs bounding boxes and class labels for table rows, columns, and header/footer regions from an input image.
It accepts image inputs (e.g., PNG, JPEG) containing a table.
Send images to the gigarouter OpenAI-compatible endpoint with your API key; the response will contain detected table structures.
No, it only recognizes table structure (rows, columns, cells). Text extraction requires a separate OCR engine.
It was fine-tuned on PubTables-1M, which contains nearly one million annotated tables from scientific articles.
We're benchmarking and onboarding Table Transformer Structure Recognition as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.