TrOCR Large Handwritten
microsoft/trocr-large-handwritten
published Mar 2022 · updated May 2024
TrOCR Large Handwritten is an image-to-text model that performs optical character recognition (OCR) on handwritten text-line images using a transformer-based encoder-decoder architecture.
specs
| Task | Image-to-Text (Optical Character Recognition) |
| Architecture | Encoder-decoder Transformer (image encoder initialized from BEiT, text decoder from RoBERTa) |
| Parameters | 558 million |
| License | MIT |
| Fine-tuned Dataset | IAM Handwriting Database |
about this model
microsoft/trocr-large-handwritten is a transformer-based optical character recognition (OCR) model that converts single text-line images into text, using an encoder-decoder architecture with a BEiT image encoder and a RoBERTa text decoder.
Architecture and Capabilities
The model processes images as sequences of 16x16 patches and generates wordpiece-level text autoregressively. With 558 million parameters, it is designed for end-to-end text recognition without a separate language model. It was introduced in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models" (AAAI 2023) and is released under the MIT license.
Benchmark Performance
The model achieves a Cased Character Error Rate (CER) of 2.89% on the IAM handwritten test set, outperforming TrOCR-Small (4.22 CER) and TrOCR-Base (3.42 CER). On the SROIE printed text benchmark, it attains an F1 score of 96.60%. For scene text recognition, word accuracies on standard benchmarks are:
| Benchmark | Accuracy |
|---|---|
| IIIT5K-3000 | 94.1% |
| SVT-647 | 96.1% |
| ICDAR2013-857 | 98.4% |
| ICDAR2013-1015 | 97.3% |
| ICDAR2015-1811 | 88.1% |
| ICDAR2015-2077 | 84.1% |
| SVTP-645 | 93.0% |
| CT80-288 | 95.1% |
These results demonstrate the model’s effectiveness across handwritten, printed, and scene text recognition tasks.
best for
- ·Handwritten text recognition from single-line images
- ·Digitization of historical manuscripts
- ·Automated form processing with handwritten fields
FAQ
It is best for optical character recognition of handwritten text from single-line images, with state-of-the-art accuracy on the IAM benchmark.
It achieves a Cased Character Error Rate (CER) of 2.89% on the IAM dataset.
Input: a single text-line image (e.g. JPEG/PNG). Output: a plain text string of recognized characters.
It is released under the MIT License.
Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model name "TrOCR Large Handwritten" and providing the image in base64 or as a URL.
We're benchmarking and onboarding TrOCR Large Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.