Qwen AgentWorld 35B A3B
unsloth/Qwen-AgentWorld-35B-A3B-GGUF
published Jun 2026 · updated Jun 2026
Qwen AgentWorld 35B A3B is a text-generation model that simulates agentic environments across seven domains by predicting the next environment state given an agent's action and interaction history.
specs
| Task | Text Generation (Language World Model) |
| Architecture | Causal Language Model with Mixture of Experts (MoE), Gated DeltaNet and Gated Attention |
| Parameters | 35B total, 3B activated |
| Context Length | 262,144 tokens |
| Training Pipeline | Continual Pre-Training (CPT) -> Supervised Fine-Tuning (SFT) -> Reinforcement Learning (RL, GSPO) |
about this model
Supported Domains
The model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS environments, spanning both text and GUI interaction domains.
Benchmark Performance
On AgentWorldBench — a comprehensive benchmark constructed from real-world interactions of 5 frontier models across 9 established benchmarks — Qwen-AgentWorld-35B-A3B achieves an overall score of 56.39 (normalized 0-100), outperforming several frontier models including Gemini 3.1 Pro (54.57), DeepSeek-V4-Pro (52.97), and Qwen3.6-Plus (50.81).
| Domain | Score |
|---|---|
| MCP | 64.79 |
| Search | 36.69 |
| Terminal | 53.96 |
| SWE | 65.63 |
| Android | 58.17 |
| Web | 49.55 |
| OS | 65.92 |
Key Strengths
- Generalizable simulator: Zero-shot generalization to out-of-distribution environments (e.g., OpenClaw) and supports controllable perturbations and fictional-world construction.
- Agent foundation model: World-model training acts as a warm-up that improves downstream performance across 7 agentic benchmarks, including 3 entirely out-of-domain.
- Scalable simulation: As a decoupled environment simulator, supports simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone.
Architecture: 35B total parameters (3B activated), 40 layers, mixture of 256 experts (8 routed + 1 shared), with a context length of 262,144 tokens. Based on Qwen3.5-35B-A3B-Base.

Recommended sampling parameters: temperature 0.6, top_p 0.95, top_k 20. Domain-specific system prompts are available in the GitHub repository.
best for
- ·Simulating agent environments for training and evaluating AI agents
- ·Zero-shot environment simulation for out-of-distribution domains (e.g., OpenClaw)
- ·Controllable environment simulation with perturbations and fictional worlds
- ·Warm-up model for improving downstream agentic task performance across 7 benchmarks
FAQ
It simulates agentic environments (e.g., terminal, web, OS) given an action and history, useful for training and evaluating AI agents without needing real environments.
It is a native world model trained specifically for next-state prediction across 7 domains, outperforming frontier models like GPT-5.4 and Claude Opus 4.8 on AgentWorldBench overall.
Input is a chat completion message with a system prompt describing the domain and a user message containing the action. Output is the predicted environment state (text observation).
Use the OpenAI-compatible endpoint with your API key, setting model name to "unsloth/Qwen-AgentWorld-35B-A3B-GGUF" and sending messages in the standard chat format.
Context length is 262,144 tokens. Recommended output length is 32,768 tokens for most queries.
We're benchmarking and onboarding Qwen AgentWorld 35B A3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.