Table Transformer Detection

microsoft/table-transformer-detection

published Oct 2022 · updated Sep 2023

Table Transformer Detection is a detection model that uses a Transformer-based object detection architecture (DETR) to detect tables in document images.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

1.5M

license

mit

specs

Task	Object Detection (Table Detection)
Architecture	DETR with normalize‑before setting
Training Dataset	PubTables-1M

about this model

The microsoft/table-transformer-detection model is a transformer-based object detection model fine-tuned for detecting tables in document images. It uses the DETR architecture with a "normalize before" setting (layer normalization applied before self- and cross-attention) and was introduced in the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Smock et al.

The model is trained on PubTables-1M, a dataset containing nearly one million tables from scientific articles. For the detection task, the dataset provides 575,305 annotated document pages. The model achieves strong results on table detection, structure recognition, and functional analysis without task-specific customization, as demonstrated in the paper.

Key characteristics

Also referred to as TATR (Table Transformer) in the literature and repository.
Pre-trained weights are available for models trained on PubTables-1M; newer TATR-v1.1 variants (released August 2023) also include training on FinTabNet.c and combined datasets.
Requires separate text extraction (from OCR or PDF) to generate HTML or CSV output; the model detects the location and structure of tables but does not extract text content from images alone.
The official repository additionally provides code for the GriTS table similarity metric and for aligning benchmark datasets, both accepted at ICDAR 2023.

This model is hosted as a managed API on gigarouter. Developers can call it directly without managing infrastructure or dependencies.

best for

·Extracting table bounding boxes from scanned PDFs and document images
·Automated table detection in scientific articles for downstream structure recognition
·Preprocessing document images for financial report data extraction

FAQ

What input format does the Table Transformer Detection model accept?

The model accepts images (e.g., PNG, JPEG) of document pages.

What does the model output?

It outputs bounding boxes for detected tables. For structured output (HTML/CSV), separate text extraction via OCR or PDF parsing is required.

How can I use this model via the gigarouter API?

Call the gigarouter OpenAI‑compatible endpoint with an API key, providing the image as input.

What dataset was the model trained on?

It was fine‑tuned on the PubTables-1M dataset, containing nearly one million tables from scientific articles.

Does this model perform table structure recognition?

No, this model is only for table detection. A separate Table Transformer model is available for structure recognition.

not yet live

We're benchmarking and onboarding Table Transformer Detection as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo

table-transformer-structure-recognition-v1.1-all

239.5K dl/mo