skip to content
gigarouter gigarouter
models / object detection · coming soon

Table Transformer V1.1 (All)

microsoft/table-transformer-structure-recognition-v1.1-all

published Nov 2023 · updated Nov 2023

Table Transformer V1.1 (All) is a detection model that recognizes table structure from images using a DETR architecture trained on PubTables-1M and FinTabNet.c.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
239.5K
license
mit

specs

TaskTable Structure Recognition
ArchitectureDETR with pre-norm
Training DataPubTables-1M & FinTabNet.c (combined)
Model Versionv1.1 (All)

about this model

Microsoft Table-Transformer-Structure-Recognition-v1.1-all is a detection model that performs table structure recognition on document images using a Transformer-based object detection architecture (DETR with pre-normalization). It identifies and localizes table components such as cells, rows, and columns within PDFs and images, outputting structured predictions suitable for downstream extraction pipelines.

Model Description

The model is equivalent to DETR (DEtection TRansformer) with the "normalize before" setting, where layer normalization is applied ahead of self- and cross-attention. It was introduced in Aligning benchmark datasets for table structure recognition (Smock et al., 2023) and originally proposed in the CVPR 2022 paper PubTables-1M. Three pre-trained weight variants are available, trained on:

  • PubTables-1M alone
  • FinTabNet.c alone
  • PubTables-1M and FinTabNet.c combined

PubTables-1M contains 575,305 annotated document pages for table detection and 947,642 fully annotated tables for structure recognition, with bounding boxes in both image and PDF coordinates. The alignment paper and the GriTS metric paper were both accepted at ICDAR 2023.

Benchmark Performance

Exact match accuracy on the ICDAR-2013 benchmark for table structure recognition is reported below. The "aligned" results follow annotation canonicalization and error removal that reduce inter-dataset inconsistency.

Training DataBaseline AccuracyAligned Accuracy
PubTables-1M65%75%
FinTabNet.c42%65%
Combined69%81%

Key Strengths

  • State-of-the-art results on table structure recognition after dataset alignment, reaching 81% exact match on ICDAR-2013
  • Robust to diverse document domains when trained on combined datasets
  • Outputs can be converted to HTML or CSV when combined with separate text extraction (OCR or PDF text)
  • Proven effectiveness at scale with million-level training data

best for

FAQ

What is the Table Transformer V1.1 (All) model best suited for?

It is best suited for table structure recognition in documents, extracting rows, columns, and cells from images of tables, particularly in scientific and financial domains.

What input format does the model expect?

The model accepts images (e.g., PNG, JPEG) as input.

What output does the model produce?

It outputs bounding boxes and class labels for table components (rows, columns, cells) as JSON; can be post-processed to HTML or CSV.

What datasets was the model trained on?

It was trained on the PubTables-1M and FinTabNet.c datasets combined.

How can I use this model via the gigarouter API?

Send a request to the gigarouter OpenAI-compatible endpoint with your API key, providing the image data; the API returns the detected table structure.

not yet live

We're benchmarking and onboarding Table Transformer V1.1 (All) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →