UI-Venus 1.5 8B

inclusionAI/UI-Venus-1.5-8B

published Feb 2026 · updated Feb 2026

UI-Venus 1.5 8B is a vision-language model designed as a unified, end-to-end GUI agent for robust real-world UI grounding and navigation tasks.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

license

apache-2.0

specs

Task	GUI grounding and navigation
Architecture	Dense vision-language model (Qwen3-VL based)
Parameters	8B
License	Apache 2.0

about this model

UI-Venus-1.5-8B is a vision-language model (VLM) purpose-built for GUI agent tasks, performing visual grounding, mobile navigation, and web navigation from raw screenshot inputs. It is the 8B dense variant of the UI-Venus-1.5 family, which also includes a 2B dense model and a 30B-A3B mixture-of-experts model. The model is trained via a four-stage pipeline: mid-training on 10 billion tokens from 30+ GUI datasets, offline reinforcement learning (RL), online RL with full-trajectory rollouts, and model merging to unify grounding, web, and mobile specialists into a single checkpoint.

Key Strengths

State-of-the-art grounding performance: UI-Venus-1.5-8B achieves 68.4% on ScreenSpot-Pro, 75.0% on VenusBench-GD, 70.6% on OSWorld-G, and 54.7% on UI-Vision. The 30B-A3B variant leads these benchmarks at 69.6%, 75.0%, 70.6%, and 54.7% respectively, while remaining competitive on ScreenSpot-V2 (96.2%).
Strong mobile and web navigation: The 8B model surpasses the previous generation's 72B variant on Android Lab (up to 5.8% improvement) and VenusBench-Mobile (16.1% vs. 15.4%). The 30B-A3B variant achieves 77.6% on AndroidWorld, 55.1%/68.1% on Android Lab, and 76.0% on WebVoyager.
Cross-platform generalization: The model performs well across programmatic Android environments, dynamic web navigation, and visual-only reasoning without XML augmentation.
Consistent scaling: Performance improves steadily from the 2B to 8B to 30B-A3B variants across all grounding and navigation benchmarks.

Training Pipeline

The model is built on the Qwen3-VL series and undergoes a four-stage training curriculum: (1) Mid-Training with 10 billion tokens across 30+ datasets for GUI domain knowledge injection; (2) Offline-RL for task-specific optimization across grounding, mobile, and web objectives; (3) Online-RL with full-trajectory rollouts for long-horizon dynamic navigation; and (4) Model Merging to unify specialized models into a single checkpoint.

Benchmark Results

Benchmark	UI-Venus-1.5-8B	UI-Venus-1.5-30B-A3B
ScreenSpot-Pro	68.4%	69.6%
VenusBench-GD	75.0%	75.0%
OSWorld-G	70.6%	70.6%
UI-Vision	54.7%	54.7%
ScreenSpot-V2	96.2%	96.2%
AndroidWorld	—	77.6%
Android Lab (Easy/Hard)	—	55.1% / 68.1%
VenusBench-Mobile	16.1%	21.5%
WebVoyager	—	76.0%

UI-Venus-1.5 benchmark performance bar chart across grounding and agent benchmarks

UI-Venus-1.5-8B is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or model loading is required. For further technical details, see the UI-Venus-1.5 Technical Report and the GitHub repository.

best for

·Automating UI interactions on mobile and web apps via screenshot-based grounding
·Building autonomous GUI agents for task completion in Android environments
·Performing visual UI element detection and action prediction from raw screenshots

FAQ

What is UI-Venus 1.5 8B best used for?

It is best for building autonomous GUI agents that understand and interact with mobile, web, and desktop interfaces using only screenshot inputs.

How does the 8B variant compare to the 30B-A3B MoE variant?

The 8B dense model is smaller and faster, while the 30B-A3B MoE variant achieves higher SOTA scores on benchmarks like ScreenSpot-Pro (69.6% vs 68.4% for 8B).

What is the license for UI-Venus 1.5 8B?

It is released under the Apache 2.0 license.

What input format does the model expect?

The model takes screenshots as visual input and can accept text instructions; it outputs UI grounding coordinates or action predictions.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with the model name UI-Venus-1.5-8B.

not yet live

We're benchmarking and onboarding UI-Venus 1.5 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit