TrOCR Small Handwritten

microsoft/trocr-small-handwritten

published Mar 2022 · updated May 2024

TrOCR Small Handwritten is an image-to-text model for optical character recognition (OCR) of handwritten text, using a Transformer encoder-decoder architecture.

status

coming soon

API providers

downloads / mo

448.6K

specs

Task	Image-to-Text (Optical Character Recognition)
Architecture	Transformer encoder-decoder (DeiT encoder, UniLM decoder)
Parameters	62M
Finetuned On	IAM Handwriting Database

about this model

TrOCR-small-handwritten is an image-to-text model that performs optical character recognition (OCR) on handwritten text-line images using a Transformer-based encoder-decoder architecture. The encoder, initialized from DeiT, processes images as sequences of 16x16 patches with absolute position embeddings, while the decoder, initialized from UniLM, autoregressively generates wordpiece-level text tokens. This end-to-end approach eliminates the need for separate CNN-based image understanding, RNN-based text generation, or post-processing language models. The model is fine-tuned on the IAM Handwriting Database and achieves a Cased Character Error Rate (CER) of 4.22 on the IAM test set. It has 62 million parameters and was trained on 384-pixel input images. The underlying TrOCR architecture, introduced by Li et al. and published at AAAI 2023, was shown to outperform prior state-of-the-art models on printed, handwritten, and scene text recognition tasks.

Benchmark performance

Benchmark	Metric	TrOCR-Small	TrOCR-Base (334M)	TrOCR-Large (558M)
IAM (handwritten)	Cased CER	4.22	3.42	2.89
SROIE (printed)	F1	95.86	96.34	96.60

Scene text recognition word accuracy (TrOCR-Base)

Dataset	Accuracy
IIIT5K-3000	93.4
SVT-647	95.2
ICDAR2013-857	98.4
ICDAR2015-1811	86.9
SVTP-645	92.1
CT80-288	90.6

Gigarouter hosts this model as a managed, OpenAI-compatible API. You send image data and receive transcribed text without managing infrastructure or model weights.

best for

·Transcribing handwritten notes and letters from scanned images
·Automating form processing with handwritten fields
·Digitizing historical manuscripts

FAQ

What input format does TrOCR Small Handwritten expect?

The model expects a single text-line image, typically preprocessed to 384x384 resolution, sent as a base64-encoded image or image URL via the gigarouter API.

What output does it produce?

It returns a plain text string of the recognized handwritten text.

How many parameters does this model have?

It has 62 million parameters, making it smaller and faster than TrOCR-Base (334M) or TrOCR-Large (558M).

What accuracy does it achieve?

On the IAM test set, it achieves a cased character error rate (CER) of 4.22%.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a request with an image URL or base64 image data, and receive the transcription in the response.

not yet live

We're benchmarking and onboarding TrOCR Small Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

PP-LCNet_x1_0_doc_ori

445.3K dl/mo