TrOCR Large Printed
microsoft/trocr-large-printed
published Mar 2022 · updated May 2024
TrOCR Large Printed is an image-to-text model for optical character recognition (OCR) on printed text-line images.
specs
| Task | Image-to-Text (Optical Character Recognition) |
| Architecture | Encoder-decoder Transformer with BEiT image encoder and RoBERTa text decoder |
| Parameters | 558M |
about this model
Microsoft TrOCR-Large-Printed is an image-to-text model for optical character recognition (OCR) on single text-line images, fine-tuned specifically for printed text on the SROIE dataset. It is based on the TrOCR architecture introduced in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models" (Li et al., AAAI 2023).
Architecture
TrOCR is an encoder-decoder Transformer. The encoder, initialized from BEiT, processes images divided into 16x16 patches with absolute position embeddings. The decoder, initialized from RoBERTa, generates wordpiece-level tokens autoregressively. The large variant contains 558 million parameters.
Benchmark Performance
On the SROIE dataset, TrOCR-Large-Printed achieves an F1 score of 96.60%. On the IAM dataset (handwriting, though the model is designed for printed text), it achieves a cased character error rate (CER) of 2.89%. The following table compares this variant with smaller TrOCR models:
| Model | Parameters | SROIE F1 | IAM Cased CER |
|---|---|---|---|
| TrOCR-Small | – | 95.86% | 4.22 |
| TrOCR-Base | – | 96.34% | 3.42 |
| TrOCR-L |
best for
- ·Extracting printed text from scanned documents
- ·Digitizing receipts and invoices (e.g., SROIE dataset)
FAQ
It is best for optical character recognition (OCR) on single text-line images of printed text, particularly fine-tuned on the SROIE receipt dataset.
TrOCR Large has 558M parameters and achieves higher accuracy than TrOCR Small and TrOCR Base, with 96.60% F1 on SROIE.
It expects a single text-line image, preprocessed into 16x16 patches. The Hugging Face processor handles resizing and normalization.
Use the gigarouter OpenAI-compatible endpoint with an API key to send image URLs or base64-encoded images and receive the recognized text.
The model card and additional sources do not specify a license; please check the official repository at github.com/microsoft/unilm/tree/master/trocr for terms.
We're benchmarking and onboarding TrOCR Large Printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.