TrOCR Base Handwritten
microsoft/trocr-base-handwritten
published Mar 2022 · updated Feb 2025
TrOCR Base Handwritten is an image-to-text model that performs optical character recognition on handwritten text from single text-line images.
specs
| Task | Image-to-Text (Optical Character Recognition) |
| Architecture | Encoder-decoder Transformer (ViT-based encoder, RoBERTa-based decoder) |
| Parameters | 334M |
| Fine-tuned On | IAM Handwriting Database |
about this model
TrOCR-base-handwritten is an optical character recognition (OCR) model that performs image-to-text transcription using a Transformer encoder-decoder architecture. The encoder is initialized from BEiT and processes images as a sequence of 16x16 patches; the decoder, initialized from RoBERTa, generates wordpiece tokens autoregressively. Fine-tuned on the IAM handwritten line dataset, the model contains 334 million parameters and expects 384x384 input images.
The model delivers strong results on handwritten, printed, and scene text benchmarks. On the IAM test set it achieves a cased character error rate (CER) of 3.42, outperforming the smaller TrOCR variant (4.22 CER) and approaching the larger model (2.89 CER). On the printed SROIE dataset it attains an F1 score of 96.34. Selected scene text recognition word accuracies are:
| Dataset | TrOCR-Base | TrOCR-Large |
|---|---|---|
| IIIT5K (3000) | 93.4% | 94.1% |
| SVT (647) | 95.2% | 96.1% |
| IC13 (857) | 98.4% | 98.4% |
| IC15 (1811) | 86.9% | 88.1% |
| SVTP (645) | 92.1% | 93.0% |
| CT80 (288) | 90.6% | 95.1% |
Originating from Microsoft Research and published at AAAI 2023, this model is hosted as a managed API on gigarouter with no additional setup required.
best for
- ·Recognizing handwritten text in scanned documents
- ·Digitizing handwritten notes and forms
- ·Processing single-line handwritten text images
FAQ
It is optimized for optical character recognition of handwritten text on single text-line images, fine-tuned on the IAM dataset.
The model expects a single text-line image, resized to 384x384 pixels, as input.
The model outputs a string of recognized text tokens.
Use the gigarouter OpenAI-compatible endpoint with an API key, sending the image as a base64-encoded string in the request.
The model has approximately 334 million parameters.
We're benchmarking and onboarding TrOCR Base Handwritten as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.