Hosted vision-language models
52 models · 0 live as APIs · benchmarked & compared
Vision-language models process both images and text, enabling tasks such as extracting structured data from scanned documents, answering questions about photographs, and generating captions for accessibility. For example, deepseek-ai/DeepSeek-OCR-2 is specialised for optical character recognition, while series like Qwen/Qwen2.5-VL-7B-Instruct and Qwen/Qwen2-VL-2B-Instruct support visual question answering and image-to-text generation.
- Document digitisation and invoice parsing
- Automated content moderation on visual platforms
- Visual search and retrieval-augmented generation (RAG) pipelines
In production, these models are often integrated into RAG workflows or multimodal chatbots. Choosing among the 32 models listed here involves balancing latency, accuracy, and cost: larger architectures such as Qwen/Qwen3.6-35B-A3B-FP8 yield higher quality on complex reasoning but require more compute, while quantised or smaller models like cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit or Qwen/Qwen3-VL-4B-Instruct serve well at lower throughputs. For most call volumes, calling a hosted API eliminates infrastructure overhead and enables elastic scaling — benefits gigarouter provides through its benchmarked, OpenAI-compatible endpoints. (Currently 0 models are live; the remainder are being onboarded.)
compare
| model | params | downloads/mo | price | status |
|---|---|---|---|---|
| Qwen/Qwen2.5-VL-7B-Instruct | 8292.2M | 9.8M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen3.6-35B-A3B-FP8 | 35953.9M | 6.2M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen2.5-VL-3B-Instruct | 3754.6M | 5.3M | ~$0.626 / 1k images | coming soon |
| cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit | 26554.3M | 5.1M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen3.6-27B-FP8 | 27782.9M | 4.9M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen3-VL-4B-Instruct | 4437.8M | 3.7M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen2-VL-2B-Instruct | 2209M | 3.6M | ~$0.626 / 1k images | coming soon |
| deepseek-ai/DeepSeek-OCR-2 | 3389.1M | 3.3M | ~$0.626 / 1k images | coming soon |
| llava-hf/llava-1.5-7b-hf | 7063.4M | 3.2M | ~$1.341 / 1k images | coming soon |
| RedHatAI/gemma-4-31B-it-FP8-block | 31274.9M | 3.2M | ~$1.341 / 1k images | coming soon |
| HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | - | 3M | at launch | coming soon |
| microsoft/Florence-2-base | 231.6M | 2.6M | ~$0.094 / 1k images | coming soon |
| Qwen/Qwen3.5-0.8B | 873.4M | 2.5M | ~$0.235 / 1k images | coming soon |
| Qwen/Qwen3-VL-2B-Instruct | 2127.5M | 2.1M | ~$0.626 / 1k images | coming soon |
| RedHatAI/gemma-4-26B-A4B-it-FP8-Dynamic | 26560.9M | 2M | ~$1.341 / 1k images | coming soon |
| cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit | 35951.8M | 1.8M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen2-VL-7B-Instruct | 8291.4M | 1.8M | ~$1.341 / 1k images | coming soon |
| Qwen/Qwen2-VL-7B-Instruct-AWQ | 8291.4M | 1.8M | ~$1.341 / 1k images | coming soon |
| unsloth/Qwen3.6-27B-MTP-GGUF | - | 1.8M | at launch | coming soon |
| Qwen/Qwen2.5-VL-7B-Instruct-AWQ | 8292.2M | 1.7M | ~$1.341 / 1k images | coming soon |
| vikhyatk/moondream2 | 1927.2M | 1.6M | ~$0.626 / 1k images | coming soon |
| unsloth/gemma-4-26B-A4B-it-GGUF | - | 1.5M | at launch | coming soon |
| OpenGVLab/InternVL2-2B | 2205.8M | 1.5M | ~$0.626 / 1k images | coming soon |
| empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF | - | 1.4M | at launch | coming soon |
| datalab-to/chandra-ocr-2 | 5295.6M | 1.3M | ~$1.341 / 1k images | coming soon |
| baidu/Unlimited-OCR | 3336.1M | 885K | ~$0.626 / 1k images | coming soon |
| unsloth/Qwen3.6-35B-A3B-GGUF | - | 874.6K | at launch | coming soon |
| unsloth/Qwen3.6-35B-A3B-MTP-GGUF | - | 734.7K | at launch | coming soon |
| Salesforce/blip2-opt-2.7b | 3744.8M | 669.8K | ~$0.626 / 1k images | coming soon |
| DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF | - | 519.4K | at launch | coming soon |
| rednote-hilab/dots.mocr | 3039.2M | 518.9K | ~$0.626 / 1k images | coming soon |
| datalab-to/surya-ocr-2 | 686.2M | 407K | ~$0.235 / 1k images | coming soon |
| rednote-hilab/dots.ocr | 3039.2M | 278.6K | ~$0.626 / 1k images | coming soon |
| baidu/Qianfan-OCR | 4741.4M | 258.6K | ~$1.341 / 1k images | coming soon |
| lightonai/LightOnOCR-2-1B | 1005.6M | 170.5K | ~$0.235 / 1k images | coming soon |
| datalab-to/chandra | 8767.1M | 138.4K | ~$1.341 / 1k images | coming soon |
| ibm-granite/granite-vision-4.1-4b | 3997.2M | 111K | ~$0.626 / 1k images | coming soon |
| HauhauCS/Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced | - | 71.7K | at launch | coming soon |
| opendatalab/MinerU2.5-Pro-2605-1.2B | 1156M | 48.8K | ~$0.235 / 1k images | coming soon |
| Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF | - | 44.8K | at launch | coming soon |
| HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP | - | 44.5K | at launch | coming soon |
| sahilchachra/Unlimited-OCR-GGUF | - | 43.7K | at launch | coming soon |
| opendatalab/MinerU2.5-2509-1.2B | 1156M | 21.2K | ~$0.235 / 1k images | coming soon |
| infly/Infinity-Parser2-Flash | 2213.2M | 16.6K | ~$0.626 / 1k images | coming soon |
| inclusionAI/UI-Venus-1.5-8B | 8767.1M | 5K | ~$1.341 / 1k images | coming soon |
| ByteDance-Seed/UI-TARS-2B-SFT | 2442.4M | 2.5K | ~$0.626 / 1k images | coming soon |
| KDLAI/KDL-Frontier-Parser-nano | 1156M | 2.3K | ~$0.235 / 1k images | coming soon |
| Salesforce/GTA1-7B | 8292.2M | 1.3K | ~$1.341 / 1k images | coming soon |
| inclusionAI/UI-Venus-1.5-2B | 2438.7M | 946 | ~$0.626 / 1k images | coming soon |
| ByteDance-Seed/UI-TARS-7B-SFT | 8291.4M | 737 | ~$1.341 / 1k images | coming soon |
| inclusionAI/UI-Venus-Ground-7B | 8292.2M | 231 | ~$1.341 / 1k images | coming soon |
| KDEGroup/UI-AGILE-3B | 3754.6M | 5 | ~$0.626 / 1k images | coming soon |