skip to content
gigarouter gigarouter
models / image-to-text · coming soon

TrOCR Small Handwritten

microsoft/trocr-small-handwritten

published Mar 2022 · updated May 2024

TrOCR Small Handwritten is an image-to-text model for optical character recognition (OCR) of handwritten text, using a Transformer encoder-decoder architecture.

status
coming soon
API providers
0
downloads / mo
448.6K

specs

TaskImage-to-Text (Optical Character Recognition)
ArchitectureTransformer encoder-decoder (DeiT encoder, UniLM decoder)
Parameters62M
Finetuned OnIAM Handwriting Database

about this model

TrOCR-small-handwritten is an image-to-text model that performs optical character recognition (OCR) on handwritten text-line images using a Transformer-based encoder-decoder architecture. The encoder, initialized from DeiT, processes images as sequences of 16x16 patches with absolute position embeddings, while the decoder, initialized from UniLM, autoregressively generates wordpiece-level text tokens. This end-to-end approach eliminates the need for separate CNN-based image understanding, RNN-based text generation, or post-processing language models. The model is fine-tuned on the IAM Handwriting Database and achieves a Cased Character Error Rate (CER) of 4.22 on the IAM test set. It has 62 million parameters and was trained on 384-pixel input images. The underlying TrOCR architecture, introduced by Li et al. and published at AAAI 2023, was shown to outperform prior state-of-the-art models on printed, handwritten, and scene text recognition tasks.

Benchmark performance

BenchmarkMetricTrOCR-SmallTrOCR-Base (334M)TrOCR-Large (558M)
IAM (handwritten)Cased CER4.223.422.89
SROIE (printed)F195.8696.3496.60

Scene text recognition word accuracy (TrOCR-Base)

DatasetAccuracy
IIIT5K-300093.4
SVT-64795.2
ICDAR2013-85798.4
ICDAR2015-181186.9
SVTP-64592.1
CT80-28890.6
Gigarouter hosts this model as a managed, OpenAI-compatible API. You send image data and receive transcribed text without managing infrastructure or model weights.

best for

FAQ

What input format does TrOCR Small Handwritten expect?

The model expects a single text-line image, typically preprocessed to 384x384 resolution, sent as a base64-encoded image or image URL via the gigarouter API.

What output does it produce?

It returns a plain text string of the recognized handwritten text.

How many parameters does this model have?

It has 62 million parameters, making it smaller and faster than TrOCR-Base (334M) or TrOCR-Large (558M).

What accuracy does it achieve?

On the IAM test set, it achieves a cased character error rate (CER) of 4.22%.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a request with an image URL or base64 image data, and receive the transcription in the response.

not yet live

We're benchmarking and onboarding TrOCR Small Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →