skip to content
gigarouter gigarouter
models / image-to-text · coming soon

Falcon OCR

tiiuae/Falcon-OCR

published Feb 2026 · updated Jul 2026

Falcon OCR is a 300M parameter early-fusion vision-language model that performs document OCR, extracting plain text, LaTeX formulas, or HTML tables from images.

est. price
~$0.094
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
5.1K
license
apache-2.0

about this model

Falcon-OCR is a 300M parameter early-fusion vision-language model for document OCR that extracts text, LaTeX formulas, or HTML tables from images. Unlike modular encoder-decoder pipelines, it uses a single Transformer with a hybrid attention mask: image tokens attend bidirectionally while text tokens decode causally conditioned on the image. Task switching is done via prompts (e.g., category="table"). An optional two‑stage pipeline adds layout detection (PP‑DocLayoutV3) for dense multi‑column documents.

Benchmark results

BenchmarkScore
olmOCR (average accuracy)80.3%
OmniDocBench (Overall↑)88.64

On olmOCR, Falcon-OCR is especially strong on multi‑column documents (87.1%) and tables (90.3%). On OmniDocBench it achieves an Overall score of 88.64 (edit distance 0.055, CDM 86.8%, TEDS 84.6%).

At 0.3B parameters, the model is roughly 3× smaller than comparable OCR VLMs, translating into higher throughput. On a single A100‑80GB with vLLM, the full layout+OCR pipeline processes 5,825 tok/s and 2.9 img/s.

Output formats

  • Plain text: general document text
  • LaTeX: formulas and mathematical expressions
  • HTML: table
not yet live

We're benchmarking and onboarding Falcon OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →