NuMarkdown 8B Thinking
numind/NuMarkdown-8B-Thinking
published Jul 2025 · updated Jun 2026
NuMarkdown 8B Thinking is an image-to-text model that converts documents into clean Markdown files using reasoning tokens to understand document layout.
specs
| Task | image-to-text |
| Architecture | Qwen 2.5-VL-7B (fine-tuned) |
| Parameters | 7B |
| License | MIT |
about this model
NuMarkdown-8B-Thinking is an image-to-text model that converts documents into clean Markdown files, optimized for retrieval-augmented generation (RAG) applications. It is a reasoning vision-language model (VLM) that generates "thinking tokens" to analyze document layout before producing the final Markdown output, with thinking token volume ranging from 20% to 500% of the answer depending on task difficulty.
The model is a fine-tune of Qwen 2.5-VL-7B, trained in two phases: supervised fine-tuning on synthetic reasoning traces from public PDFs, followed by reinforcement learning (GRPO) with a layout-centric reward on challenging image examples. It is released under the MIT license and is compatible with Text Generation Inference, Inference Endpoints, and Azure deployment (US region). As of its release, the model has received over 1,800,000 downloads and 477 likes on Hugging Face, indicating strong community adoption.
Benchmark Results
In an arena ranking with approximately 500 model-anonymized votes using a trueskill-2 system, NuMarkdown-8B-Thinking (labeled NuMarkdown-reasoning) achieved a μ score of 26.10 (σ 0.79), outperforming OCRFlux-3B (24.63), GPT-4o (24.48), and non-reasoning variants, while remaining competitive with Gemini Flash reasoning (26.75). A win/draw/loss matrix against other models based on image-only evaluation is shown below.
| Rank | Model | μ | σ | μ − 3σ |
|---|---|---|---|---|
| 1 | gemini-flash-reasoning | 26.75 | 0.80 | 24.35 |
| 2 | NuMarkdown-reasoning | 26.10 | 0.79 | 23.72 |
| 3 | NuMarkdown-reasoning-w/o_grpo | 25.32 | 0.80 | 22.93 |
| 4 | OCRFlux-3B | 24.63 | 0.80 | 22.22 |
| 5 | gpt-4o | 24.48 | 0.80 | 22.08 |
| 6 | gemini-flash-w/o_reasoning | 24.11 | 0.79 | 21.74 |
| 7 | RolmoOCR | 23.53 | 0.82 | 21.07 |
Example Output
The model demonstrates capability on documents with complex layouts, including multi-section headers, bullet points, tables with merged cells, and footnotes, as illustrated in the generated output below.

best for
- ·Converting complex documents with weird layouts to clean Markdown
- ·Extracting tables and structured content for RAG pipelines
- ·OCR tasks requiring reasoning about document structure
FAQ
It excels at converting complex documents (with unusual layouts and tables) into clean Markdown for RAG applications.
According to an arena ranking with around 500 votes, NuMarkdown 8B Thinking outperforms GPT-4o on document-to-Markdown tasks.
Input is a document image. Output contains thinking tokens followed by a Markdown answer wrapped in <answer> tags.
Use the gigarouter OpenAI-compatible endpoint with your API key.
The model is released under the MIT license.
We're benchmarking and onboarding NuMarkdown 8B Thinking as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.