skip to content
gigarouter gigarouter
models / vision-language · coming soon

Unlimited OCR

baidu/Unlimited-OCR

published Jun 2026 · updated Jul 2026

Unlimited OCR is a vlm model that performs one-shot long-horizon document parsing and text recognition using a constant KV cache attention mechanism.

est. price
~$0.626
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
885K
license
mit

specs

TaskOptical Character Recognition (OCR) / Document Parsing
ArchitectureVLM with Reference Sliding Window Attention (R-SWA)
Context Length32,768 tokens
InputSingle image or multi-page PDF (up to 32K tokens)

about this model

Unlimited-OCR is a vision-language model (VLM) that performs end-to-end document parsing and optical character recognition (OCR), designed to transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K tokens. Built upon DeepSeek-OCR, the model replaces all decoder attention layers with Reference Sliding Window Attention (R-SWA), a mechanism that reduces attention computation costs and maintains a constant KV cache throughout the entire decoding process. This design eliminates the progressive slowdown and memory growth typical of long-sequence generation, enabling efficient long-horizon parsing.

Key Capabilities

  • Single-image parsing with two configuration modes: gundam (base_size=1024, image_size=640, crop_mode=True) for detailed document regions, and base (base_size=1024, image_size=1024) for full-page processing.
  • Multi-page and PDF parsing using the base configuration, converting PDF pages to images at configurable DPI.
  • Structured Markdown output for parsed document content.

Benchmark Performance

On the ParseBench benchmark, Unlimited-OCR achieves the following scores:

Metric Score Rank
Mean46.17#13
Text Content86.81#9
Text Formatting0.97#18
Layout71.52#6
Chart1.34#13
Table70.21#12

The model has received 885,040 total downloads and 1,690 likes on Hugging Face. R-SWA is a general-purpose parsing attention mechanism applicable beyond OCR to tasks such as ASR and translation.

Unlimited OCR logo Unlimited OCR overview diagram

best for

FAQ

What is Unlimited OCR best used for?

Unlimited OCR is designed for one-shot long-horizon document parsing, capable of transcribing dozens of pages in a single forward pass with constant KV cache memory.

How does it handle long documents compared to traditional OCR models?

It uses Reference Sliding Window Attention (R-SWA) to maintain a constant KV cache throughout decoding, avoiding the memory growth and slowdown that plagues standard transformers on long sequences.

What input formats are supported?

The model accepts single images (with two modes: gundam and base) and multi-page PDFs (converted to images, base mode only). All inputs are processed under a 32K token context length.

How can I call Unlimited OCR via the gigarouter API?

Use the OpenAI-compatible endpoint at gigarouter with your API key. Send a chat completion request containing a user message with an image_url and text prompt, then set the model parameter to 'Unlimited-OCR'.

What is the difference between gundam and base image modes?

Gundam mode uses base_size=1024, image_size=640 with cropping for single images. Base mode uses image_size=1024 without cropping and is required for multi-page or PDF inputs.

not yet live

We're benchmarking and onboarding Unlimited OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →