Table Transformer Detection
microsoft/table-transformer-detection
published Oct 2022 · updated Sep 2023
Table Transformer Detection is a detection model that uses a Transformer-based object detection architecture (DETR) to detect tables in document images.
specs
| Task | Object Detection (Table Detection) |
| Architecture | DETR with normalize‑before setting |
| Training Dataset | PubTables-1M |
about this model
The microsoft/table-transformer-detection model is a transformer-based object detection model fine-tuned for detecting tables in document images. It uses the DETR architecture with a "normalize before" setting (layer normalization applied before self- and cross-attention) and was introduced in the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Smock et al.
The model is trained on PubTables-1M, a dataset containing nearly one million tables from scientific articles. For the detection task, the dataset provides 575,305 annotated document pages. The model achieves strong results on table detection, structure recognition, and functional analysis without task-specific customization, as demonstrated in the paper.
Key characteristics
- Also referred to as TATR (Table Transformer) in the literature and repository.
- Pre-trained weights are available for models trained on PubTables-1M; newer TATR-v1.1 variants (released August 2023) also include training on FinTabNet.c and combined datasets.
- Requires separate text extraction (from OCR or PDF) to generate HTML or CSV output; the model detects the location and structure of tables but does not extract text content from images alone.
- The official repository additionally provides code for the GriTS table similarity metric and for aligning benchmark datasets, both accepted at ICDAR 2023.
This model is hosted as a managed API on gigarouter. Developers can call it directly without managing infrastructure or dependencies.
best for
- ·Extracting table bounding boxes from scanned PDFs and document images
- ·Automated table detection in scientific articles for downstream structure recognition
- ·Preprocessing document images for financial report data extraction
FAQ
The model accepts images (e.g., PNG, JPEG) of document pages.
It outputs bounding boxes for detected tables. For structured output (HTML/CSV), separate text extraction via OCR or PDF parsing is required.
Call the gigarouter OpenAI‑compatible endpoint with an API key, providing the image as input.
It was fine‑tuned on the PubTables-1M dataset, containing nearly one million tables from scientific articles.
No, this model is only for table detection. A separate Table Transformer model is available for structure recognition.
We're benchmarking and onboarding Table Transformer Detection as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.