TrOCR Base Handwritten

microsoft/trocr-base-handwritten

published Mar 2022 · updated Feb 2025

TrOCR Base Handwritten is an image-to-text model that performs optical character recognition on handwritten text from single text-line images.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

124K

license

mit

specs

Task	Image-to-Text (Optical Character Recognition)
Architecture	Encoder-decoder Transformer (ViT-based encoder, RoBERTa-based decoder)
Parameters	334M
Fine-tuned On	IAM Handwriting Database

about this model

TrOCR-base-handwritten is an optical character recognition (OCR) model that performs image-to-text transcription using a Transformer encoder-decoder architecture. The encoder is initialized from BEiT and processes images as a sequence of 16x16 patches; the decoder, initialized from RoBERTa, generates wordpiece tokens autoregressively. Fine-tuned on the IAM handwritten line dataset, the model contains 334 million parameters and expects 384x384 input images.

The model delivers strong results on handwritten, printed, and scene text benchmarks. On the IAM test set it achieves a cased character error rate (CER) of 3.42, outperforming the smaller TrOCR variant (4.22 CER) and approaching the larger model (2.89 CER). On the printed SROIE dataset it attains an F1 score of 96.34. Selected scene text recognition word accuracies are:

Dataset	TrOCR-Base	TrOCR-Large
IIIT5K (3000)	93.4%	94.1%
SVT (647)	95.2%	96.1%
IC13 (857)	98.4%	98.4%
IC15 (1811)	86.9%	88.1%
SVTP (645)	92.1%	93.0%
CT80 (288)	90.6%	95.1%

Originating from Microsoft Research and published at AAAI 2023, this model is hosted as a managed API on gigarouter with no additional setup required.

best for

·Recognizing handwritten text in scanned documents
·Digitizing handwritten notes and forms
·Processing single-line handwritten text images

FAQ

What is this model best for?

It is optimized for optical character recognition of handwritten text on single text-line images, fine-tuned on the IAM dataset.

What is the input format?

The model expects a single text-line image, resized to 384x384 pixels, as input.

What is the output format?

The model outputs a string of recognized text tokens.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key, sending the image as a base64-encoded string in the request.

What is the model size?

The model has approximately 334 million parameters.

not yet live

We're benchmarking and onboarding TrOCR Base Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo