tasks / image-to-text

Hosted image-to-text models

37 models · 0 live as APIs · benchmarked & compared

Image-to-text models convert visual information into structured text outputs. They solve a range of problems: optical character recognition (OCR) extracts printed or handwritten text from scanned documents and photos; image captioning generates descriptive text for accessibility, content moderation, or metadata generation; and specialist models handle document layout analysis, orientation detection, or domain-specific inputs such as manga panels. For example, microsoft/trocr-small-handwritten transcribes handwritten notes, while PaddlePaddle/PP-OCRv5_server_det detects and reads text in natural scenes. Others like Salesforce/blip-image-captioning-base produce natural language captions, and numind/NuExtract3 extracts structured data from document images.

In production, these models are typically chained into pipelines. A common pattern is document processing: first detect text regions, then recognize characters, and finally parse the output into actionable fields. Some systems combine orientation detection (PaddlePaddle/PP-LCNet_x1_0_doc_ori) and full document understanding (PaddlePaddle/UVDoc) before extraction. The choice between models involves a trade-off between size, quality, and speed. Smaller models offer lower latency and reduced compute cost but may sacrifice accuracy on noisy or complex inputs. Larger models deliver higher-quality results at the expense of throughput. Domain‑specific models, such as kha-white/manga-ocr-base, can outperform general‑purpose OCR on their target data.

For most call volumes, using a hosted API eliminates the operational burden of managing infrastructure, provisioning GPUs, and handling scaling—while still providing pay‑as‑you‑go flexibility and consistent performance.

compare

model	params	downloads/mo	price	status
Salesforce/blip-image-captioning-base	-	1.9M	at launch	coming soon
Salesforce/blip-image-captioning-large	469.7M	752.9K	~$0.094 / 1k images	coming soon
PaddlePaddle/PP-OCRv5_server_det	-	587.3K	at launch	coming soon
numind/NuExtract3	4539.3M	520.7K	~$1.341 / 1k images	coming soon
PaddlePaddle/UVDoc	-	512.8K	at launch	coming soon
microsoft/trocr-small-handwritten	-	448.6K	at launch	coming soon
PaddlePaddle/PP-LCNet_x1_0_doc_ori	-	445.3K	at launch	coming soon
kha-white/manga-ocr-base	-	389.4K	at launch	coming soon
ibm-granite/granite-vision-3.3-2b	2975.4M	343.3K	~$0.626 / 1k images	coming soon
PaddlePaddle/PP-LCNet_x1_0_textline_ori	-	274.6K	at launch	coming soon
microsoft/trocr-base-printed	333.3M	251.5K	~$0.094 / 1k images	coming soon
lightonai/LightOnOCR-1B-1025	1161.2M	199.9K	~$0.235 / 1k images	coming soon
PaddlePaddle/PP-OCRv5_server_rec	-	189.4K	at launch	coming soon
microsoft/trocr-large-handwritten	-	182.4K	at launch	coming soon
microsoft/kosmos-2-patch14-224	1664.5M	166.7K	~$0.626 / 1k images	coming soon
naver-clova-ix/donut-base	-	166K	at launch	coming soon
microsoft/trocr-base-stage1	384.3M	149K	~$0.094 / 1k images	coming soon
facebook/nougat-base	348.7M	145.4K	~$0.094 / 1k images	coming soon
microsoft/trocr-large-printed	608.1M	133K	~$0.235 / 1k images	coming soon
PaddlePaddle/PP-OCRv5_mobile_det	-	129.4K	at launch	coming soon
microsoft/trocr-base-handwritten	333.3M	124K	~$0.094 / 1k images	coming soon
alibaba-damo/mgp-str-base	148M	110.8K	~$0.047 / 1k images	coming soon
PaddlePaddle/PP-OCRv6_medium_det	-	89K	at launch	coming soon
PaddlePaddle/PP-OCRv6_medium_rec	-	79.9K	at launch	coming soon
PaddlePaddle/PP-OCRv5_mobile_rec	-	74.5K	at launch	coming soon
rtr46/meiki.txt.recognition.v0	-	65.6K	at launch	coming soon
nlpconnect/vit-gpt2-image-captioning	-	64.4K	at launch	coming soon
PaddlePaddle/latin_PP-OCRv5_mobile_rec	-	37.5K	at launch	coming soon
microsoft/trocr-small-printed	61.4M	36.3K	~$0.047 / 1k images	coming soon
facebook/nougat-small	247.4M	28.5K	~$0.094 / 1k images	coming soon
unsloth/GLM-OCR	-	28K	at launch	coming soon
numind/NuMarkdown-8B-Thinking	8292.2M	26.1K	~$1.341 / 1k images	coming soon
PaddlePaddle/en_PP-OCRv4_mobile_rec	-	24.6K	at launch	coming soon
PaddlePaddle/PP-DocLayout_plus-L	-	21.3K	at launch	coming soon
PaddlePaddle/PP-OCRv4_mobile_det	-	20.1K	at launch	coming soon
PaddlePaddle/PP-DocBlockLayout	-	18.6K	at launch	coming soon
tiiuae/Falcon-OCR	269.9M	5.1K	~$0.094 / 1k images	coming soon

get a key + $25 free →docs