skip to content
gigarouter gigarouter
models / image-to-text · coming soon

Nougat Base

facebook/nougat-base

published Sep 2023 · updated Nov 2023

Nougat Base is an image-to-text model that converts PDF page images into Markdown, with a focus on LaTeX math and tables in scientific documents.

est. price
~$0.094
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
145.4K
license
cc-by-nc-4.0

specs

Taskimage-to-text
ArchitectureSwin Transformer encoder + mBART decoder
InputPDF page images (pixels)
OutputMathpix Markdown (.mmd)

about this model

Nougat-base is an image-to-text model that transcribes scientific PDF pages into Markdown using a Visual Transformer architecture. It combines a Swin Transformer vision encoder with an mBART text decoder to autoregressively generate markdown directly from pixel inputs. The model is designed specifically for academic documents, preserving mathematical expressions and tables that standard OCR often loses.

Introduced in the paper Nougat: Neural Optical Understanding for Academic Documents (Blecher et al., 2023), this base version (0.1.0-base) is trained to perform an Optical Character Recognition (OCR) task for processing scientific documents into a markup language. It processes PDF images without requiring prior OCR preprocessing, outputting .mmd (Mathpix Markdown) format.

High-level overview of the Nougat model architecture, showing pixel input passing through a Swin Transformer encoder and mBART decoder to produce markdown output.

The model is well-suited for converting research papers, books, and scientific journals into machine-readable text while retaining semantic information in equations and tables. It can handle full PDFs or selected page ranges, with a built-in failure detection heuristic to skip problematic pages.

best for

FAQ

How does Nougat Base differ from the small version?

The small version is the default tag (0.1.0-small); base is the larger variant (0.1.0-base), offering higher accuracy at the cost of more compute.

What output format does Nougat produce?

It outputs Mathpix Markdown (.mmd), a plain-text format that preserves LaTeX math and table structures.

How can I call Nougat Base via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a PDF image as input to receive Markdown text.

What is the license of the Nougat Base model?

The model card does not specify a license; the associated code repository uses MIT, and a separate MODEL license file exists but its terms are not disclosed.

Can Nougat handle multi-page PDFs?

Yes, it processes each page as an image; the CLI supports batch processing and page-range selection (e.g., --pages 1-4,7).

not yet live

We're benchmarking and onboarding Nougat Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →