tasks / vision-language

Hosted vision-language models

52 models · 0 live as APIs · benchmarked & compared

Vision-language models process both images and text, enabling tasks such as extracting structured data from scanned documents, answering questions about photographs, and generating captions for accessibility. For example, deepseek-ai/DeepSeek-OCR-2 is specialised for optical character recognition, while series like Qwen/Qwen2.5-VL-7B-Instruct and Qwen/Qwen2-VL-2B-Instruct support visual question answering and image-to-text generation.

Document digitisation and invoice parsing
Automated content moderation on visual platforms
Visual search and retrieval-augmented generation (RAG) pipelines

In production, these models are often integrated into RAG workflows or multimodal chatbots. Choosing among the 32 models listed here involves balancing latency, accuracy, and cost: larger architectures such as Qwen/Qwen3.6-35B-A3B-FP8 yield higher quality on complex reasoning but require more compute, while quantised or smaller models like cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit or Qwen/Qwen3-VL-4B-Instruct serve well at lower throughputs. For most call volumes, calling a hosted API eliminates infrastructure overhead and enables elastic scaling — benefits gigarouter provides through its benchmarked, OpenAI-compatible endpoints. (Currently 0 models are live; the remainder are being onboarded.)

compare

model	params	downloads/mo	price	status
Qwen/Qwen2.5-VL-7B-Instruct	8292.2M	9.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen3.6-35B-A3B-FP8	35953.9M	6.2M	~$1.341 / 1k images	coming soon
Qwen/Qwen2.5-VL-3B-Instruct	3754.6M	5.3M	~$0.626 / 1k images	coming soon
cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit	26554.3M	5.1M	~$1.341 / 1k images	coming soon
Qwen/Qwen3.6-27B-FP8	27782.9M	4.9M	~$1.341 / 1k images	coming soon
Qwen/Qwen3-VL-4B-Instruct	4437.8M	3.7M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-2B-Instruct	2209M	3.6M	~$0.626 / 1k images	coming soon
deepseek-ai/DeepSeek-OCR-2	3389.1M	3.3M	~$0.626 / 1k images	coming soon
llava-hf/llava-1.5-7b-hf	7063.4M	3.2M	~$1.341 / 1k images	coming soon
RedHatAI/gemma-4-31B-it-FP8-block	31274.9M	3.2M	~$1.341 / 1k images	coming soon
HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive	-	3M	at launch	coming soon
microsoft/Florence-2-base	231.6M	2.6M	~$0.094 / 1k images	coming soon
Qwen/Qwen3.5-0.8B	873.4M	2.5M	~$0.235 / 1k images	coming soon
Qwen/Qwen3-VL-2B-Instruct	2127.5M	2.1M	~$0.626 / 1k images	coming soon
RedHatAI/gemma-4-26B-A4B-it-FP8-Dynamic	26560.9M	2M	~$1.341 / 1k images	coming soon
cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit	35951.8M	1.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-7B-Instruct	8291.4M	1.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-7B-Instruct-AWQ	8291.4M	1.8M	~$1.341 / 1k images	coming soon
unsloth/Qwen3.6-27B-MTP-GGUF	-	1.8M	at launch	coming soon
Qwen/Qwen2.5-VL-7B-Instruct-AWQ	8292.2M	1.7M	~$1.341 / 1k images	coming soon
vikhyatk/moondream2	1927.2M	1.6M	~$0.626 / 1k images	coming soon
unsloth/gemma-4-26B-A4B-it-GGUF	-	1.5M	at launch	coming soon
OpenGVLab/InternVL2-2B	2205.8M	1.5M	~$0.626 / 1k images	coming soon
empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF	-	1.4M	at launch	coming soon
datalab-to/chandra-ocr-2	5295.6M	1.3M	~$1.341 / 1k images	coming soon
baidu/Unlimited-OCR	3336.1M	885K	~$0.626 / 1k images	coming soon
unsloth/Qwen3.6-35B-A3B-GGUF	-	874.6K	at launch	coming soon
unsloth/Qwen3.6-35B-A3B-MTP-GGUF	-	734.7K	at launch	coming soon
Salesforce/blip2-opt-2.7b	3744.8M	669.8K	~$0.626 / 1k images	coming soon
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF	-	519.4K	at launch	coming soon
rednote-hilab/dots.mocr	3039.2M	518.9K	~$0.626 / 1k images	coming soon
datalab-to/surya-ocr-2	686.2M	407K	~$0.235 / 1k images	coming soon
rednote-hilab/dots.ocr	3039.2M	278.6K	~$0.626 / 1k images	coming soon
baidu/Qianfan-OCR	4741.4M	258.6K	~$1.341 / 1k images	coming soon
lightonai/LightOnOCR-2-1B	1005.6M	170.5K	~$0.235 / 1k images	coming soon
datalab-to/chandra	8767.1M	138.4K	~$1.341 / 1k images	coming soon
ibm-granite/granite-vision-4.1-4b	3997.2M	111K	~$0.626 / 1k images	coming soon
HauhauCS/Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced	-	71.7K	at launch	coming soon
opendatalab/MinerU2.5-Pro-2605-1.2B	1156M	48.8K	~$0.235 / 1k images	coming soon
Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF	-	44.8K	at launch	coming soon
HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP	-	44.5K	at launch	coming soon
sahilchachra/Unlimited-OCR-GGUF	-	43.7K	at launch	coming soon
opendatalab/MinerU2.5-2509-1.2B	1156M	21.2K	~$0.235 / 1k images	coming soon
infly/Infinity-Parser2-Flash	2213.2M	16.6K	~$0.626 / 1k images	coming soon
inclusionAI/UI-Venus-1.5-8B	8767.1M	5K	~$1.341 / 1k images	coming soon
ByteDance-Seed/UI-TARS-2B-SFT	2442.4M	2.5K	~$0.626 / 1k images	coming soon
KDLAI/KDL-Frontier-Parser-nano	1156M	2.3K	~$0.235 / 1k images	coming soon
Salesforce/GTA1-7B	8292.2M	1.3K	~$1.341 / 1k images	coming soon
inclusionAI/UI-Venus-1.5-2B	2438.7M	946	~$0.626 / 1k images	coming soon
ByteDance-Seed/UI-TARS-7B-SFT	8291.4M	737	~$1.341 / 1k images	coming soon
inclusionAI/UI-Venus-Ground-7B	8292.2M	231	~$1.341 / 1k images	coming soon
KDEGroup/UI-AGILE-3B	3754.6M	5	~$0.626 / 1k images	coming soon

get a key + $25 free →docs