Question 1

What is Unlimited OCR best used for?

Accepted Answer

Unlimited OCR is designed for one-shot long-horizon document parsing, capable of transcribing dozens of pages in a single forward pass with constant KV cache memory.

Question 2

How does it handle long documents compared to traditional OCR models?

Accepted Answer

It uses Reference Sliding Window Attention (R-SWA) to maintain a constant KV cache throughout decoding, avoiding the memory growth and slowdown that plagues standard transformers on long sequences.

Question 3

What input formats are supported?

Accepted Answer

The model accepts single images (with two modes: gundam and base) and multi-page PDFs (converted to images, base mode only). All inputs are processed under a 32K token context length.

Question 4

How can I call Unlimited OCR via the gigarouter API?

Accepted Answer

Use the OpenAI-compatible endpoint at gigarouter with your API key. Send a chat completion request containing a user message with an image_url and text prompt, then set the model parameter to &#x27;Unlimited-OCR&#x27;.

Question 5

What is the difference between gundam and base image modes?

Accepted Answer

Gundam mode uses base_size=1024, image_size=640 with cropping for single images. Base mode uses image_size=1024 without cropping and is required for multi-page or PDF inputs.

Task	Optical Character Recognition (OCR) / Document Parsing
Architecture	VLM with Reference Sliding Window Attention (R-SWA)
Context Length	32,768 tokens
Input	Single image or multi-page PDF (up to 32K tokens)

Metric	Score	Rank
Mean	46.17	#13
Text Content	86.81	#9
Text Formatting	0.97	#18
Layout	71.52	#6
Chart	1.34	#13
Table	70.21	#12

Unlimited OCR

specs

about this model

Key Capabilities

Benchmark Performance

best for

FAQ

related vision-language models