NuExtract3

numind/NuExtract3

published Apr 2026 · updated Jun 2026

NuExtract3 is a 4B vision-language reasoning model for document understanding that performs structured information extraction and image-to-Markdown conversion.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

520.7K

license

apache-2.0

specs

Task	Image-to-text (document understanding, structured extraction, image-to-Markdown)
Architecture	Fine-tuned from Qwen3.5-4B
Parameters	4B
License	Apache-2.0

about this model

NuExtract3 is a 4-billion-parameter vision-language reasoning model for document understanding, fine-tuned from Qwen3.5-4B and licensed under Apache-2.0. It performs structured information extraction and image-to-Markdown conversion from documents such as scans, receipts, forms, invoices, contracts, and tables. The model accepts text, images, or both as input, supports multilingual documents, and offers both reasoning and non-reasoning inference modes.

Structured extraction

NuExtract3 takes an input document, a JSON template that specifies the types of fields to extract, and optional instructions. Supported field types include verbatim-string, string, integer, number, date-time, and others. The model outputs a JSON object matching the template structure; fields not found in the document return null or [].

Document-to-Markdown

NuExtract3 converts document images into clean Markdown, using HTML for tables, LaTeX for math, and <figure> tags for images.

Benchmark results

On NuMind's internal structured extraction benchmark (~600 documents including invoices, movie posters, and floor plans), NuExtract3 achieved an average score of 0.651 ± 0.019, outperforming larger models such as Qwen3.5-9B (0.479) and Gemma-4-E4B-it (0.538). The evaluation uses a tree-aligned metric: string fields are scored with indel distance, and other types with exact match. Only 27 of 600 outputs failed to parse as valid JSON.

Bar chart comparing structured extraction scores of NuExtract3 against other models

Chart showing document-to-Markdown evaluation results

Chart showing two-step Markdown-to-structured extraction evaluation results

Inference modes

NuExtract3 supports both non-reasoning (fast, deterministic) and reasoning (for complex layouts and ambiguous fields) modes. For production extraction, non-reasoning mode is recommended as the default, with reasoning enabled only for difficult examples.

best for

·Extracting structured JSON data from invoices, receipts, forms, and contracts
·Converting scanned documents or images into clean Markdown for RAG preprocessing
·Multilingual document understanding and OCR pipelines

FAQ

What input formats does NuExtract3 support?

It accepts text, images, or a combination of both, along with an optional JSON template and instructions.

Does NuExtract3 support reasoning modes?

Yes, it supports both reasoning (thinking) and non-reasoning inference modes for different accuracy and speed requirements.

What is the output format for structured extraction?

The output is a JSON object that follows the structure of the provided JSON template, with typed leaf values.

How can I call NuExtract3 via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with the model name and your input.

What license is NuExtract3 released under?

It is released under the Apache-2.0 license.

not yet live

We're benchmarking and onboarding NuExtract3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo

PP-LCNet_x1_0_doc_ori

445.3K dl/mo