Table Transformer Structure Recognition

microsoft/table-transformer-structure-recognition

published Oct 2022 · updated Sep 2023

Table Transformer Structure Recognition is a detection model for identifying table structures (rows, columns, cells) in images, based on the DETR architecture.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

1.8M

license

mit

specs

Task	Table Structure Recognition
Architecture	DETR (Transformer-based object detection with pre-norm)
Dataset	PubTables-1M

about this model

Microsoft Table Transformer Structure Recognition is an object detection model based on DETR (Detection Transformer) that identifies and localizes structural elements within tables — such as rows, columns, and spanning cells — from image input. It was introduced in the CVPR 2022 paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents and is hosted on Gigarouter as a managed, OpenAI-compatible API.

Key Capabilities

The model predicts bounding boxes and class labels for table components without requiring task-specific customization. It uses a “normalize before” variant of DETR, applying layer normalization before self‑ and cross‑attention. The raw model output can be combined with separate OCR or PDF text extraction to produce structured formats such as HTML or CSV.

Training and Dataset

Fine-tuned on the PubTables-1M dataset, which contains 947,642 fully annotated tables for structure recognition and 575,305 document pages for table detection. The dataset was canonicalized to eliminate oversegmentation inconsistencies found in earlier corpora, leading to more reliable training and evaluation.

Performance and Publication

Accepted at CVPR 2022. The authors report that transformer‑based models trained on PubTables-1M deliver excellent results for table structure recognition, detection, and functional analysis. Pre‑trained weights for this structure‑recognition model were released in May 2022; three updated TATR-v1.1 variants (trained on PubTables-1M, FinTabNet.c, and a combined set) became available in August 2023.

best for

·Extracting row and column boundaries from table images in scientific papers
·Automating table-to-HTML or table-to-CSV conversion by combining with OCR

FAQ

What does this model output?

It outputs bounding boxes and class labels for table rows, columns, and header/footer regions from an input image.

What input format does it require?

It accepts image inputs (e.g., PNG, JPEG) containing a table.

How can I use this model via an API?

Send images to the gigarouter OpenAI-compatible endpoint with your API key; the response will contain detected table structures.

Does the model extract text from tables?

No, it only recognizes table structure (rows, columns, cells). Text extraction requires a separate OCR engine.

What dataset was it trained on?

It was fine-tuned on PubTables-1M, which contains nearly one million annotated tables from scientific articles.

not yet live

We're benchmarking and onboarding Table Transformer Structure Recognition as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo

table-transformer-structure-recognition-v1.1-all

239.5K dl/mo