skip to content
gigarouter gigarouter
models / image-to-text · coming soon

TrOCR Large Printed

microsoft/trocr-large-printed

published Mar 2022 · updated May 2024

TrOCR Large Printed is an image-to-text model for optical character recognition (OCR) on printed text-line images.

est. price
~$0.235
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
133K

specs

TaskImage-to-Text (Optical Character Recognition)
ArchitectureEncoder-decoder Transformer with BEiT image encoder and RoBERTa text decoder
Parameters558M

about this model

Microsoft TrOCR-Large-Printed is an image-to-text model for optical character recognition (OCR) on single text-line images, fine-tuned specifically for printed text on the SROIE dataset. It is based on the TrOCR architecture introduced in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models" (Li et al., AAAI 2023).

Architecture

TrOCR is an encoder-decoder Transformer. The encoder, initialized from BEiT, processes images divided into 16x16 patches with absolute position embeddings. The decoder, initialized from RoBERTa, generates wordpiece-level tokens autoregressively. The large variant contains 558 million parameters.

Benchmark Performance

On the SROIE dataset, TrOCR-Large-Printed achieves an F1 score of 96.60%. On the IAM dataset (handwriting, though the model is designed for printed text), it achieves a cased character error rate (CER) of 2.89%. The following table compares this variant with smaller TrOCR models:

ModelParametersSROIE F1IAM Cased CER
TrOCR-Small95.86%4.22
TrOCR-Base96.34%3.42
TrOCR-L

best for

FAQ

What is TrOCR Large Printed best used for?

It is best for optical character recognition (OCR) on single text-line images of printed text, particularly fine-tuned on the SROIE receipt dataset.

How does TrOCR Large compare to other TrOCR variants?

TrOCR Large has 558M parameters and achieves higher accuracy than TrOCR Small and TrOCR Base, with 96.60% F1 on SROIE.

What input format does the model expect?

It expects a single text-line image, preprocessed into 16x16 patches. The Hugging Face processor handles resizing and normalization.

How can I call this model via API?

Use the gigarouter OpenAI-compatible endpoint with an API key to send image URLs or base64-encoded images and receive the recognized text.

What is the license for this model?

The model card and additional sources do not specify a license; please check the official repository at github.com/microsoft/unilm/tree/master/trocr for terms.

not yet live

We're benchmarking and onboarding TrOCR Large Printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →