Dots MOCR

rednote-hilab/dots.mocr

published Mar 2026 · updated Jul 2026

Dots MOCR is a vlm model that jointly parses text and graphics from documents into structured outputs, including SVG code for charts, diagrams, and UI layouts.

est. price

~$0.626

/ 1k images · estimated, set at launch

API providers

downloads / mo

518.9K

license

mit

specs

Task	Multimodal OCR / Document Parsing
Architecture	DotsOCRForCausalLM (custom VLM based on Qwen3-VL-4B)
Parameters	3B
License	MIT

about this model

dots.mocr is a 3B-parameter multimodal OCR model that jointly parses text and structured graphics from documents into unified textual representations, including SVG code for visual elements. It is hosted on Gigarouter as an OpenAI-compatible API.

Document Parsing Performance

On the OCR Arena Elo leaderboard, dots.mocr achieves an average Elo score of 1124.7 across olmOCR-Bench, OmniDocBench (v1.5), and XDocParse, ranking second only to Gemini 3 Pro (1210.7). It sets a new state of the art on olmOCR Bench with an overall score of 83.9±0.9, outperforming all open-source document parsing systems. On OmniDocBench (v1.5), it achieves the lowest TextEdit error rate (0.031) and Read Order Edit error rate (0.029) among specialized VLMs.

Structured Graphics Parsing

dots.mocr converts structured graphics such as charts, UI layouts, scientific figures, and chemical diagrams directly into SVG code. On the Unisvg benchmark, it achieves a score of 0.894 (0.850 low-level, 0.923 high-level), surpassing Gemini 3 Pro (0.735). It also scores 0.801 on Design2Code and 0.772 on Chartmimic.

Benchmark Summary

Benchmark	dots.mocr Score	Context
OCR Arena Elo (Average)	1124.7	Second only to Gemini 3 Pro (1210.7)
olmOCR Bench (Overall)	83.9±0.9	State-of-the-art among open-source systems
OmniDocBench v1.5 (TextEdit)	0.031	Lowest error rate among specialized VLMs
Unisvg (Score)	0.894	Outperforms Gemini 3 Pro (0.735)

dots.mocr is released under the MIT license. It is a 3B-parameter model trained via staged pretraining and supervised fine-tuning on a data engine built from PDFs, rendered webpages, and native SVG assets. A variant optimized for image-to-SVG tasks, dots.mocr-svg, is also available.

best for

·Multilingual document parsing (text, tables, headers)
·Converting charts, UI layouts, and scientific figures to SVG
·Interactive dialogue and semantic understanding of document content

FAQ

What is Dots MOCR best used for?

It excels at multimodal document parsing, converting structured graphics (charts, diagrams) to SVG, and interactive dialogue about documents.

How large is the model and how fast is it?

It has 3B parameters, making it compact and efficient for inference.

What is the license for Dots MOCR?

It is released under the MIT license.

What is the input and output format?

Input is an image plus a text prompt; output is text that can include structured SVG code.

How can I call Dots MOCR via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

not yet live

We're benchmarking and onboarding Dots MOCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit