skip to content
gigarouter gigarouter
models / image-to-text · coming soon

TrOCR Base Handwritten

microsoft/trocr-base-handwritten

published Mar 2022 · updated Feb 2025

TrOCR Base Handwritten is an image-to-text model that performs optical character recognition on handwritten text from single text-line images.

est. price
~$0.094
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
124K
license
mit

specs

TaskImage-to-Text (Optical Character Recognition)
ArchitectureEncoder-decoder Transformer (ViT-based encoder, RoBERTa-based decoder)
Parameters334M
Fine-tuned OnIAM Handwriting Database

about this model

TrOCR-base-handwritten is an optical character recognition (OCR) model that performs image-to-text transcription using a Transformer encoder-decoder architecture. The encoder is initialized from BEiT and processes images as a sequence of 16x16 patches; the decoder, initialized from RoBERTa, generates wordpiece tokens autoregressively. Fine-tuned on the IAM handwritten line dataset, the model contains 334 million parameters and expects 384x384 input images.

The model delivers strong results on handwritten, printed, and scene text benchmarks. On the IAM test set it achieves a cased character error rate (CER) of 3.42, outperforming the smaller TrOCR variant (4.22 CER) and approaching the larger model (2.89 CER). On the printed SROIE dataset it attains an F1 score of 96.34. Selected scene text recognition word accuracies are:

DatasetTrOCR-BaseTrOCR-Large
IIIT5K (3000)93.4%94.1%
SVT (647)95.2%96.1%
IC13 (857)98.4%98.4%
IC15 (1811)86.9%88.1%
SVTP (645)92.1%93.0%
CT80 (288)90.6%95.1%

Originating from Microsoft Research and published at AAAI 2023, this model is hosted as a managed API on gigarouter with no additional setup required.

best for

FAQ

What is this model best for?

It is optimized for optical character recognition of handwritten text on single text-line images, fine-tuned on the IAM dataset.

What is the input format?

The model expects a single text-line image, resized to 384x384 pixels, as input.

What is the output format?

The model outputs a string of recognized text tokens.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key, sending the image as a base64-encoded string in the request.

What is the model size?

The model has approximately 334 million parameters.

not yet live

We're benchmarking and onboarding TrOCR Base Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →