Nougat Base
facebook/nougat-base
published Sep 2023 · updated Nov 2023
Nougat Base is an image-to-text model that converts PDF page images into Markdown, with a focus on LaTeX math and tables in scientific documents.
specs
| Task | image-to-text |
| Architecture | Swin Transformer encoder + mBART decoder |
| Input | PDF page images (pixels) |
| Output | Mathpix Markdown (.mmd) |
about this model
Nougat-base is an image-to-text model that transcribes scientific PDF pages into Markdown using a Visual Transformer architecture. It combines a Swin Transformer vision encoder with an mBART text decoder to autoregressively generate markdown directly from pixel inputs. The model is designed specifically for academic documents, preserving mathematical expressions and tables that standard OCR often loses.
Introduced in the paper Nougat: Neural Optical Understanding for Academic Documents (Blecher et al., 2023), this base version (0.1.0-base) is trained to perform an Optical Character Recognition (OCR) task for processing scientific documents into a markup language. It processes PDF images without requiring prior OCR preprocessing, outputting .mmd (Mathpix Markdown) format.
The model is well-suited for converting research papers, books, and scientific journals into machine-readable text while retaining semantic information in equations and tables. It can handle full PDFs or selected page ranges, with a built-in failure detection heuristic to skip problematic pages.
best for
- ·Transcribing academic PDFs into machine-readable Markdown with math formulas
- ·Extracting tables and LaTeX from scanned or digital scientific documents
FAQ
The small version is the default tag (0.1.0-small); base is the larger variant (0.1.0-base), offering higher accuracy at the cost of more compute.
It outputs Mathpix Markdown (.mmd), a plain-text format that preserves LaTeX math and table structures.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending a PDF image as input to receive Markdown text.
The model card does not specify a license; the associated code repository uses MIT, and a separate MODEL license file exists but its terms are not disclosed.
Yes, it processes each page as an image; the CLI supports batch processing and page-range selection (e.g., --pages 1-4,7).
We're benchmarking and onboarding Nougat Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.