skip to content
gigarouter gigarouter
models / image-to-text · coming soon

Manga OCR Base

kha-white/manga-ocr-base

published Mar 2022 · updated Jun 2022

Manga OCR Base is an image-to-text model that performs optical character recognition for Japanese text, with the main focus being Japanese manga.

status
coming soon
API providers
0
downloads / mo
389.4K
license
apache-2.0

specs

TaskImage-to-Text (OCR)
ArchitectureVision Encoder-Decoder (ViT + text decoder)
InputImage (JPEG, PNG, etc.)
OutputJapanese text (supports multi-line)

about this model

Manga OCR is an optical character recognition model for Japanese text, specialized for printed text in manga and other image-heavy contexts. It uses a Vision Encoder Decoder architecture (Transformers framework) and is designed to handle the unique challenges of manga: vertical and horizontal text orientation, furigana, text overlaid on images, a wide variety of fonts and font styles, and low‑quality images. Unlike many OCR models, it supports recognizing multi‑line text in a single forward pass, allowing entire text bubbles to be processed without line splitting.

Key Capabilities

  • Robust recognition of both vertical and horizontal Japanese text.
  • Accurate handling of furigana (ruby annotations) and mixed‑script text.
  • Works directly on text overlaid on complex backgrounds, common in manga panels.
  • Performs well across diverse font families and degraded image quality.
  • End‑to‑end pipeline: accepts full images or cropped regions and outputs recognized text.

This model is hosted by Gigarouter as a managed, OpenAI‑compatible API. The underlying code and training details are available in the official repository.

best for

FAQ

What is Manga OCR Base best for?

It is optimized for Japanese text recognition in manga, handling vertical/horizontal text, furigana, and poor image quality.

Does it support multi-line text in a single pass?

Yes, it can recognize multi-line text from a single forward pass, ideal for processing entire text bubbles at once.

What input formats are accepted?

It accepts image files (e.g., JPEG, PNG) or PIL Image objects via the Python API.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.

What is the model architecture?

It uses the Vision Encoder-Decoder framework from Hugging Face Transformers, combining a vision encoder with a text decoder.

not yet live

We're benchmarking and onboarding Manga OCR Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →