UI-Venus 1.5 2B
inclusionAI/UI-Venus-1.5-2B
published Feb 2026 · updated Feb 2026
UI-Venus 1.5 2B is a vision-language model (VLM) that serves as a unified GUI agent for UI grounding and navigation tasks.
specs
| Task | GUI Agent / UI Grounding & Navigation |
| Architecture | Qwen3-VL-based dense transformer |
| Parameters | 2 billion |
| License | Apache 2.0 |
about this model
UI-Venus-1.5-2B is a vision-language model (VLM) for GUI agent tasks that performs visual grounding and autonomous navigation across mobile, desktop, and web interfaces. It is the 2B-parameter dense variant of the UI-Venus-1.5 family, which also includes 8B and 30B-A3B mixture-of-experts variants.
Benchmark Performance
UI-Venus-1.5 establishes state-of-the-art results on key grounding and agent benchmarks. The 2B variant achieves 57.7% on ScreenSpot-Pro, with the 8B and 30B-A3B models reaching 68.4% and 69.6% respectively. On VenusBench-GD the family leads with 75.0% (30B-A3B). In mobile navigation, the 30B-A3B model scores 77.6% on AndroidWorld and 55.1%/68.1% on AndroidLab, outperforming prior specialist and general-purpose VLMs.


Training Methodology
UI-Venus-1.5 employs a four-stage training pipeline: Mid-Training on 10B tokens from 30+ GUI datasets, offline RL, online RL with full-trajectory rollouts, and model merging that unifies grounding, web, and mobile specialists into a single checkpoint. This curriculum enables robust cross-platform generalization and long-horizon navigation in real-world environments.
best for
- ·Automated mobile app testing and navigation
- ·Web browser automation and element grounding
- ·GUI element detection and coordinate prediction
FAQ
It accepts a screenshot image and a textual instruction (e.g., "tap the login button").
It outputs action predictions (e.g., tap coordinates, scroll) or grounded element locations depending on the task.
Use the OpenAI-compatible endpoint with your API key, sending a chat completion request with a system prompt, user message containing image_url and text.
It is licensed under Apache 2.0, allowing commercial use.
The 2B variant is smaller and faster, suitable for resource-constrained deployments, while larger variants offer higher accuracy on benchmarks like ScreenSpot-Pro and AndroidWorld.
We're benchmarking and onboarding UI-Venus 1.5 2B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.