NuExtract3
numind/NuExtract3
published Apr 2026 · updated Jun 2026
NuExtract3 is a 4B vision-language reasoning model for document understanding that performs structured information extraction and image-to-Markdown conversion.
specs
| Task | Image-to-text (document understanding, structured extraction, image-to-Markdown) |
| Architecture | Fine-tuned from Qwen3.5-4B |
| Parameters | 4B |
| License | Apache-2.0 |
about this model
NuExtract3 is a 4-billion-parameter vision-language reasoning model for document understanding, fine-tuned from Qwen3.5-4B and licensed under Apache-2.0. It performs structured information extraction and image-to-Markdown conversion from documents such as scans, receipts, forms, invoices, contracts, and tables. The model accepts text, images, or both as input, supports multilingual documents, and offers both reasoning and non-reasoning inference modes.
Structured extraction
NuExtract3 takes an input document, a JSON template that specifies the types of fields to extract, and optional instructions. Supported field types include verbatim-string, string, integer, number, date-time, and others. The model outputs a JSON object matching the template structure; fields not found in the document return null or [].
Document-to-Markdown
NuExtract3 converts document images into clean Markdown, using HTML for tables, LaTeX for math, and <figure> tags for images.
Benchmark results
On NuMind's internal structured extraction benchmark (~600 documents including invoices, movie posters, and floor plans), NuExtract3 achieved an average score of 0.651 ± 0.019, outperforming larger models such as Qwen3.5-9B (0.479) and Gemma-4-E4B-it (0.538). The evaluation uses a tree-aligned metric: string fields are scored with indel distance, and other types with exact match. Only 27 of 600 outputs failed to parse as valid JSON.
Inference modes
NuExtract3 supports both non-reasoning (fast, deterministic) and reasoning (for complex layouts and ambiguous fields) modes. For production extraction, non-reasoning mode is recommended as the default, with reasoning enabled only for difficult examples.
best for
- ·Extracting structured JSON data from invoices, receipts, forms, and contracts
- ·Converting scanned documents or images into clean Markdown for RAG preprocessing
- ·Multilingual document understanding and OCR pipelines
FAQ
It accepts text, images, or a combination of both, along with an optional JSON template and instructions.
Yes, it supports both reasoning (thinking) and non-reasoning inference modes for different accuracy and speed requirements.
The output is a JSON object that follows the structure of the provided JSON template, with typed leaf values.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with the model name and your input.
It is released under the Apache-2.0 license.
We're benchmarking and onboarding NuExtract3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.