skip to content
gigarouter gigarouter
models / image-to-text · coming soon

TrOCR Large Handwritten

microsoft/trocr-large-handwritten

published Mar 2022 · updated May 2024

TrOCR Large Handwritten is an image-to-text model that performs optical character recognition (OCR) on handwritten text-line images using a transformer-based encoder-decoder architecture.

status
coming soon
API providers
0
downloads / mo
182.4K

specs

TaskImage-to-Text (Optical Character Recognition)
ArchitectureEncoder-decoder Transformer (image encoder initialized from BEiT, text decoder from RoBERTa)
Parameters558 million
LicenseMIT
Fine-tuned DatasetIAM Handwriting Database

about this model

microsoft/trocr-large-handwritten is a transformer-based optical character recognition (OCR) model that converts single text-line images into text, using an encoder-decoder architecture with a BEiT image encoder and a RoBERTa text decoder.

Architecture and Capabilities

The model processes images as sequences of 16x16 patches and generates wordpiece-level text autoregressively. With 558 million parameters, it is designed for end-to-end text recognition without a separate language model. It was introduced in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models" (AAAI 2023) and is released under the MIT license.

Benchmark Performance

The model achieves a Cased Character Error Rate (CER) of 2.89% on the IAM handwritten test set, outperforming TrOCR-Small (4.22 CER) and TrOCR-Base (3.42 CER). On the SROIE printed text benchmark, it attains an F1 score of 96.60%. For scene text recognition, word accuracies on standard benchmarks are:

BenchmarkAccuracy
IIIT5K-300094.1%
SVT-64796.1%
ICDAR2013-85798.4%
ICDAR2013-101597.3%
ICDAR2015-181188.1%
ICDAR2015-207784.1%
SVTP-64593.0%
CT80-28895.1%

These results demonstrate the model’s effectiveness across handwritten, printed, and scene text recognition tasks.

best for

FAQ

What is this model best for?

It is best for optical character recognition of handwritten text from single-line images, with state-of-the-art accuracy on the IAM benchmark.

How accurate is it on the IAM handwritten test set?

It achieves a Cased Character Error Rate (CER) of 2.89% on the IAM dataset.

What are the input and output formats?

Input: a single text-line image (e.g. JPEG/PNG). Output: a plain text string of recognized characters.

What license does this model use?

It is released under the MIT License.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model name "TrOCR Large Handwritten" and providing the image in base64 or as a URL.

not yet live

We're benchmarking and onboarding TrOCR Large Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →