skip to content
gigarouter gigarouter
models / object detection · coming soon

Table Transformer Detection

microsoft/table-transformer-detection

published Oct 2022 · updated Sep 2023

Table Transformer Detection is a detection model that uses a Transformer-based object detection architecture (DETR) to detect tables in document images.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
1.5M
license
mit

specs

TaskObject Detection (Table Detection)
ArchitectureDETR with normalize‑before setting
Training DatasetPubTables-1M

about this model

The microsoft/table-transformer-detection model is a transformer-based object detection model fine-tuned for detecting tables in document images. It uses the DETR architecture with a "normalize before" setting (layer normalization applied before self- and cross-attention) and was introduced in the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Smock et al.

The model is trained on PubTables-1M, a dataset containing nearly one million tables from scientific articles. For the detection task, the dataset provides 575,305 annotated document pages. The model achieves strong results on table detection, structure recognition, and functional analysis without task-specific customization, as demonstrated in the paper.

Key characteristics

  • Also referred to as TATR (Table Transformer) in the literature and repository.
  • Pre-trained weights are available for models trained on PubTables-1M; newer TATR-v1.1 variants (released August 2023) also include training on FinTabNet.c and combined datasets.
  • Requires separate text extraction (from OCR or PDF) to generate HTML or CSV output; the model detects the location and structure of tables but does not extract text content from images alone.
  • The official repository additionally provides code for the GriTS table similarity metric and for aligning benchmark datasets, both accepted at ICDAR 2023.

This model is hosted as a managed API on gigarouter. Developers can call it directly without managing infrastructure or dependencies.

best for

FAQ

What input format does the Table Transformer Detection model accept?

The model accepts images (e.g., PNG, JPEG) of document pages.

What does the model output?

It outputs bounding boxes for detected tables. For structured output (HTML/CSV), separate text extraction via OCR or PDF parsing is required.

How can I use this model via the gigarouter API?

Call the gigarouter OpenAI‑compatible endpoint with an API key, providing the image as input.

What dataset was the model trained on?

It was fine‑tuned on the PubTables-1M dataset, containing nearly one million tables from scientific articles.

Does this model perform table structure recognition?

No, this model is only for table detection. A separate Table Transformer model is available for structure recognition.

not yet live

We're benchmarking and onboarding Table Transformer Detection as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →