TrOCR Small Printed
microsoft/trocr-small-printed
published Mar 2022 · updated May 2024
TrOCR Small Printed is an image-to-text model that performs optical character recognition on single text-line images using a Transformer encoder-decoder architecture.
specs
| Task | Image-to-Text (Optical Character Recognition) |
| Architecture | Encoder-decoder Transformer (image encoder from DeiT, text decoder from UniLM) |
| Parameters | 62 million |
about this model
microsoft/trocr-small-printed is an image-to-text model that performs optical character recognition (OCR) on printed text lines, fine-tuned on the SROIE dataset.
Architecture and Approach
The model uses an encoder-decoder Transformer architecture. The image encoder is initialized from DeiT weights and processes input images as a sequence of 16x16 patches with absolute position embeddings. The text decoder, initialized from UniLM, autoregressively generates wordpiece-level tokens. This end-to-end design eliminates the need for separate CNN, RNN, and language model components, as demonstrated in the TrOCR paper (Li et al., AAAI 2023).
Benchmark Performance on SROIE
The SROIE (ICDAR 2019 Scanned Receipts OCR and Information Extraction) benchmark evaluates OCR on challenging receipt images with poor print quality, low resolution, folded documents, and privacy-blurred fields. The following table compares TrOCR variants on SROIE F1 score, as reported in the official GitHub repository:
| Model | Parameters | SROIE F1 |
|---|---|---|
| TrOCR-Small (printed) | 62M | 95.86 |
| TrOCR-Base (printed) | 334M | 96.34 |
| TrOCR-Large (printed) | 558M | 96.60 |
The small variant thus achieves competitive accuracy with significantly fewer parameters, making it a lightweight option for printed text recognition.
Capabilities and Context
The model is designed for single text-line images. According to the paper abstract, the TrOCR family outperforms prior state-of-the-art models on printed, handwritten, and scene text recognition tasks. This particular checkpoint is specialized for printed text, leveraging the SROIE training set to handle real-world receipt variability.
best for
- ·Optical character recognition on printed receipts
- ·Transcribing single-line text from scanned documents
- ·Automated data extraction from invoices
FAQ
It is best for printed text OCR on single text-line images, particularly receipts and scanned documents.
It has 62 million parameters (small version).
The model accepts images as pixel values (e.g., via TrOCRProcessor) and outputs generated text.
Call the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.
It was fine-tuned on the SROIE dataset (ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction).
We're benchmarking and onboarding TrOCR Small Printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.