skip to content
gigarouter gigarouter
models / image-to-text · coming soon

TrOCR Small Printed

microsoft/trocr-small-printed

published Mar 2022 · updated May 2024

TrOCR Small Printed is an image-to-text model that performs optical character recognition on single text-line images using a Transformer encoder-decoder architecture.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
36.3K

specs

TaskImage-to-Text (Optical Character Recognition)
ArchitectureEncoder-decoder Transformer (image encoder from DeiT, text decoder from UniLM)
Parameters62 million

about this model

microsoft/trocr-small-printed is an image-to-text model that performs optical character recognition (OCR) on printed text lines, fine-tuned on the SROIE dataset.

Architecture and Approach

The model uses an encoder-decoder Transformer architecture. The image encoder is initialized from DeiT weights and processes input images as a sequence of 16x16 patches with absolute position embeddings. The text decoder, initialized from UniLM, autoregressively generates wordpiece-level tokens. This end-to-end design eliminates the need for separate CNN, RNN, and language model components, as demonstrated in the TrOCR paper (Li et al., AAAI 2023).

Benchmark Performance on SROIE

The SROIE (ICDAR 2019 Scanned Receipts OCR and Information Extraction) benchmark evaluates OCR on challenging receipt images with poor print quality, low resolution, folded documents, and privacy-blurred fields. The following table compares TrOCR variants on SROIE F1 score, as reported in the official GitHub repository:

ModelParametersSROIE F1
TrOCR-Small (printed)62M95.86
TrOCR-Base (printed)334M96.34
TrOCR-Large (printed)558M96.60

The small variant thus achieves competitive accuracy with significantly fewer parameters, making it a lightweight option for printed text recognition.

Capabilities and Context

The model is designed for single text-line images. According to the paper abstract, the TrOCR family outperforms prior state-of-the-art models on printed, handwritten, and scene text recognition tasks. This particular checkpoint is specialized for printed text, leveraging the SROIE training set to handle real-world receipt variability.

best for

FAQ

What is TrOCR Small Printed best for?

It is best for printed text OCR on single text-line images, particularly receipts and scanned documents.

How many parameters does this model have?

It has 62 million parameters (small version).

What input format does the model accept?

The model accepts images as pixel values (e.g., via TrOCRProcessor) and outputs generated text.

How can I use this model via the gigarouter API?

Call the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.

What dataset was it fine-tuned on?

It was fine-tuned on the SROIE dataset (ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction).

not yet live

We're benchmarking and onboarding TrOCR Small Printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →