Table Transformer V1.1 (All)

microsoft/table-transformer-structure-recognition-v1.1-all

published Nov 2023 · updated Nov 2023

Table Transformer V1.1 (All) is a detection model that recognizes table structure from images using a DETR architecture trained on PubTables-1M and FinTabNet.c.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

239.5K

license

mit

specs

Task	Table Structure Recognition
Architecture	DETR with pre-norm
Training Data	PubTables-1M & FinTabNet.c (combined)
Model Version	v1.1 (All)

about this model

Microsoft Table-Transformer-Structure-Recognition-v1.1-all is a detection model that performs table structure recognition on document images using a Transformer-based object detection architecture (DETR with pre-normalization). It identifies and localizes table components such as cells, rows, and columns within PDFs and images, outputting structured predictions suitable for downstream extraction pipelines.

Model Description

The model is equivalent to DETR (DEtection TRansformer) with the "normalize before" setting, where layer normalization is applied ahead of self- and cross-attention. It was introduced in Aligning benchmark datasets for table structure recognition (Smock et al., 2023) and originally proposed in the CVPR 2022 paper PubTables-1M. Three pre-trained weight variants are available, trained on:

PubTables-1M alone
FinTabNet.c alone
PubTables-1M and FinTabNet.c combined

PubTables-1M contains 575,305 annotated document pages for table detection and 947,642 fully annotated tables for structure recognition, with bounding boxes in both image and PDF coordinates. The alignment paper and the GriTS metric paper were both accepted at ICDAR 2023.

Benchmark Performance

Exact match accuracy on the ICDAR-2013 benchmark for table structure recognition is reported below. The "aligned" results follow annotation canonicalization and error removal that reduce inter-dataset inconsistency.

Training Data	Baseline Accuracy	Aligned Accuracy
PubTables-1M	65%	75%
FinTabNet.c	42%	65%
Combined	69%	81%

Key Strengths

State-of-the-art results on table structure recognition after dataset alignment, reaching 81% exact match on ICDAR-2013
Robust to diverse document domains when trained on combined datasets
Outputs can be converted to HTML or CSV when combined with separate text extraction (OCR or PDF text)
Proven effectiveness at scale with million-level training data

best for

·Extracting table structure from scientific and financial documents
·Converting table images to HTML or CSV
·Automating data extraction from invoices and forms

FAQ

What is the Table Transformer V1.1 (All) model best suited for?

It is best suited for table structure recognition in documents, extracting rows, columns, and cells from images of tables, particularly in scientific and financial domains.

What input format does the model expect?

The model accepts images (e.g., PNG, JPEG) as input.

What output does the model produce?

It outputs bounding boxes and class labels for table components (rows, columns, cells) as JSON; can be post-processed to HTML or CSV.

What datasets was the model trained on?

It was trained on the PubTables-1M and FinTabNet.c datasets combined.

How can I use this model via the gigarouter API?

Send a request to the gigarouter OpenAI-compatible endpoint with your API key, providing the image data; the API returns the detected table structure.

not yet live

We're benchmarking and onboarding Table Transformer V1.1 (All) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo