Manga OCR Base
kha-white/manga-ocr-base
published Mar 2022 · updated Jun 2022
Manga OCR Base is an image-to-text model that performs optical character recognition for Japanese text, with the main focus being Japanese manga.
specs
| Task | Image-to-Text (OCR) |
| Architecture | Vision Encoder-Decoder (ViT + text decoder) |
| Input | Image (JPEG, PNG, etc.) |
| Output | Japanese text (supports multi-line) |
about this model
Manga OCR is an optical character recognition model for Japanese text, specialized for printed text in manga and other image-heavy contexts. It uses a Vision Encoder Decoder architecture (Transformers framework) and is designed to handle the unique challenges of manga: vertical and horizontal text orientation, furigana, text overlaid on images, a wide variety of fonts and font styles, and low‑quality images. Unlike many OCR models, it supports recognizing multi‑line text in a single forward pass, allowing entire text bubbles to be processed without line splitting.
Key Capabilities
- Robust recognition of both vertical and horizontal Japanese text.
- Accurate handling of furigana (ruby annotations) and mixed‑script text.
- Works directly on text overlaid on complex backgrounds, common in manga panels.
- Performs well across diverse font families and degraded image quality.
- End‑to‑end pipeline: accepts full images or cropped regions and outputs recognized text.
This model is hosted by Gigarouter as a managed, OpenAI‑compatible API. The underlying code and training details are available in the official repository.
best for
- ·OCR of Japanese manga text bubbles
- ·Reading vertical and horizontal Japanese text
- ·Extracting text from low-quality or overlaid images
FAQ
It is optimized for Japanese text recognition in manga, handling vertical/horizontal text, furigana, and poor image quality.
Yes, it can recognize multi-line text from a single forward pass, ideal for processing entire text bubbles at once.
It accepts image files (e.g., JPEG, PNG) or PIL Image objects via the Python API.
Use the OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image in the request.
It uses the Vision Encoder-Decoder framework from Hugging Face Transformers, combining a vision encoder with a text decoder.
We're benchmarking and onboarding Manga OCR Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.