skip to content
gigarouter gigarouter
models / vision-language · coming soon

UI-Venus 1.5 8B

inclusionAI/UI-Venus-1.5-8B

published Feb 2026 · updated Feb 2026

UI-Venus 1.5 8B is a vision-language model designed as a unified, end-to-end GUI agent for robust real-world UI grounding and navigation tasks.

est. price
~$1.341
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
5K
license
apache-2.0

specs

TaskGUI grounding and navigation
ArchitectureDense vision-language model (Qwen3-VL based)
Parameters8B
LicenseApache 2.0

about this model

UI-Venus-1.5-8B is a vision-language model (VLM) purpose-built for GUI agent tasks, performing visual grounding, mobile navigation, and web navigation from raw screenshot inputs. It is the 8B dense variant of the UI-Venus-1.5 family, which also includes a 2B dense model and a 30B-A3B mixture-of-experts model. The model is trained via a four-stage pipeline: mid-training on 10 billion tokens from 30+ GUI datasets, offline reinforcement learning (RL), online RL with full-trajectory rollouts, and model merging to unify grounding, web, and mobile specialists into a single checkpoint.

Key Strengths

  • State-of-the-art grounding performance: UI-Venus-1.5-8B achieves 68.4% on ScreenSpot-Pro, 75.0% on VenusBench-GD, 70.6% on OSWorld-G, and 54.7% on UI-Vision. The 30B-A3B variant leads these benchmarks at 69.6%, 75.0%, 70.6%, and 54.7% respectively, while remaining competitive on ScreenSpot-V2 (96.2%).
  • Strong mobile and web navigation: The 8B model surpasses the previous generation's 72B variant on Android Lab (up to 5.8% improvement) and VenusBench-Mobile (16.1% vs. 15.4%). The 30B-A3B variant achieves 77.6% on AndroidWorld, 55.1%/68.1% on Android Lab, and 76.0% on WebVoyager.
  • Cross-platform generalization: The model performs well across programmatic Android environments, dynamic web navigation, and visual-only reasoning without XML augmentation.
  • Consistent scaling: Performance improves steadily from the 2B to 8B to 30B-A3B variants across all grounding and navigation benchmarks.

Training Pipeline

The model is built on the Qwen3-VL series and undergoes a four-stage training curriculum: (1) Mid-Training with 10 billion tokens across 30+ datasets for GUI domain knowledge injection; (2) Offline-RL for task-specific optimization across grounding, mobile, and web objectives; (3) Online-RL with full-trajectory rollouts for long-horizon dynamic navigation; and (4) Model Merging to unify specialized models into a single checkpoint.

UI-Venus-1.5 four-stage training pipeline diagram

Benchmark Results

BenchmarkUI-Venus-1.5-8BUI-Venus-1.5-30B-A3B
ScreenSpot-Pro68.4%69.6%
VenusBench-GD75.0%75.0%
OSWorld-G70.6%70.6%
UI-Vision54.7%54.7%
ScreenSpot-V296.2%96.2%
AndroidWorld77.6%
Android Lab (Easy/Hard)55.1% / 68.1%
VenusBench-Mobile16.1%21.5%
WebVoyager76.0%
UI-Venus-1.5 benchmark performance bar chart across grounding and agent benchmarks

UI-Venus-1.5-8B is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or model loading is required. For further technical details, see the UI-Venus-1.5 Technical Report and the GitHub repository.

best for

FAQ

What is UI-Venus 1.5 8B best used for?

It is best for building autonomous GUI agents that understand and interact with mobile, web, and desktop interfaces using only screenshot inputs.

How does the 8B variant compare to the 30B-A3B MoE variant?

The 8B dense model is smaller and faster, while the 30B-A3B MoE variant achieves higher SOTA scores on benchmarks like ScreenSpot-Pro (69.6% vs 68.4% for 8B).

What is the license for UI-Venus 1.5 8B?

It is released under the Apache 2.0 license.

What input format does the model expect?

The model takes screenshots as visual input and can accept text instructions; it outputs UI grounding coordinates or action predictions.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a chat completion request with the model name UI-Venus-1.5-8B.

not yet live

We're benchmarking and onboarding UI-Venus 1.5 8B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →