NuMarkdown 8B Thinking

numind/NuMarkdown-8B-Thinking

published Jul 2025 · updated Jun 2026

NuMarkdown 8B Thinking is an image-to-text model that converts documents into clean Markdown files using reasoning tokens to understand document layout.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

26.1K

license

mit

specs

Task	image-to-text
Architecture	Qwen 2.5-VL-7B (fine-tuned)
Parameters	7B
License	MIT

about this model

NuMarkdown-8B-Thinking is an image-to-text model that converts documents into clean Markdown files, optimized for retrieval-augmented generation (RAG) applications. It is a reasoning vision-language model (VLM) that generates "thinking tokens" to analyze document layout before producing the final Markdown output, with thinking token volume ranging from 20% to 500% of the answer depending on task difficulty.

The model is a fine-tune of Qwen 2.5-VL-7B, trained in two phases: supervised fine-tuning on synthetic reasoning traces from public PDFs, followed by reinforcement learning (GRPO) with a layout-centric reward on challenging image examples. It is released under the MIT license and is compatible with Text Generation Inference, Inference Endpoints, and Azure deployment (US region). As of its release, the model has received over 1,800,000 downloads and 477 likes on Hugging Face, indicating strong community adoption.

Benchmark Results

In an arena ranking with approximately 500 model-anonymized votes using a trueskill-2 system, NuMarkdown-8B-Thinking (labeled NuMarkdown-reasoning) achieved a μ score of 26.10 (σ 0.79), outperforming OCRFlux-3B (24.63), GPT-4o (24.48), and non-reasoning variants, while remaining competitive with Gemini Flash reasoning (26.75). A win/draw/loss matrix against other models based on image-only evaluation is shown below.

Rank	Model	μ	σ	μ − 3σ
1	gemini-flash-reasoning	26.75	0.80	24.35
2	NuMarkdown-reasoning	26.10	0.79	23.72
3	NuMarkdown-reasoning-w/o_grpo	25.32	0.80	22.93
4	OCRFlux-3B	24.63	0.80	22.22
5	gpt-4o	24.48	0.80	22.08
6	gemini-flash-w/o_reasoning	24.11	0.79	21.74
7	RolmoOCR	23.53	0.82	21.07

Win/draw/loss rate comparison against other models

Example Output

The model demonstrates capability on documents with complex layouts, including multi-section headers, bullet points, tables with merged cells, and footnotes, as illustrated in the generated output below.

Example document conversion from source image to Markdown output

best for

·Converting complex documents with weird layouts to clean Markdown
·Extracting tables and structured content for RAG pipelines
·OCR tasks requiring reasoning about document structure

FAQ

What is this model best for?

It excels at converting complex documents (with unusual layouts and tables) into clean Markdown for RAG applications.

How does NuMarkdown 8B Thinking compare to GPT-4o?

According to an arena ranking with around 500 votes, NuMarkdown 8B Thinking outperforms GPT-4o on document-to-Markdown tasks.

What is the input/output format?

Input is a document image. Output contains thinking tokens followed by a Markdown answer wrapped in <answer> tags.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

What license is the model released under?

The model is released under the MIT license.

not yet live

We're benchmarking and onboarding NuMarkdown 8B Thinking as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo