Qwen AgentWorld 35B A3B

unsloth/Qwen-AgentWorld-35B-A3B-GGUF

published Jun 2026 · updated Jun 2026

Qwen AgentWorld 35B A3B is a text-generation model that simulates agentic environments across seven domains by predicting the next environment state given an agent's action and interaction history.

status

coming soon

API providers

downloads / mo

259.8K

license

apache-2.0

specs

Task	Text Generation (Language World Model)
Architecture	Causal Language Model with Mixture of Experts (MoE), Gated DeltaNet and Gated Attention
Parameters	35B total, 3B activated
Context Length	262,144 tokens
Training Pipeline	Continual Pre-Training (CPT) -> Supervised Fine-Tuning (SFT) -> Reinforcement Learning (RL, GSPO)

about this model

Qwen-AgentWorld-35B-A3B is a language world model for text-generation that simulates agentic environments across seven interaction domains by predicting the next environment state given an agent's action and interaction history through long chain-of-thought reasoning. Trained on more than 10 million real-world environment interaction trajectories, the model is built through a three-stage pipeline: continual pre-training (CPT) injects environment knowledge, supervised fine-tuning (SFT) activates next-state-prediction reasoning, and reinforcement learning (RL) sharpens simulation fidelity. It is a native world model — environment modeling is the training objective from the CPT stage onward, not a post-hoc adaptation.

Supported Domains

The model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS environments, spanning both text and GUI interaction domains.

Benchmark Performance

On AgentWorldBench — a comprehensive benchmark constructed from real-world interactions of 5 frontier models across 9 established benchmarks — Qwen-AgentWorld-35B-A3B achieves an overall score of 56.39 (normalized 0-100), outperforming several frontier models including Gemini 3.1 Pro (54.57), DeepSeek-V4-Pro (52.97), and Qwen3.6-Plus (50.81).

Domain	Score
MCP	64.79
Search	36.69
Terminal	53.96
SWE	65.63
Android	58.17
Web	49.55
OS	65.92

Key Strengths

Generalizable simulator: Zero-shot generalization to out-of-distribution environments (e.g., OpenClaw) and supports controllable perturbations and fictional-world construction.
Agent foundation model: World-model training acts as a warm-up that improves downstream performance across 7 agentic benchmarks, including 3 entirely out-of-domain.
Scalable simulation: As a decoupled environment simulator, supports simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone.

Architecture: 35B total parameters (3B activated), 40 layers, mixture of 256 experts (8 routed + 1 shared), with a context length of 262,144 tokens. Based on Qwen3.5-35B-A3B-Base.

Qwen-AgentWorld model architecture diagram

Recommended sampling parameters: temperature 0.6, top_p 0.95, top_k 20. Domain-specific system prompts are available in the GitHub repository.

best for

·Simulating agent environments for training and evaluating AI agents
·Zero-shot environment simulation for out-of-distribution domains (e.g., OpenClaw)
·Controllable environment simulation with perturbations and fictional worlds
·Warm-up model for improving downstream agentic task performance across 7 benchmarks

FAQ

What is this model best used for?

It simulates agentic environments (e.g., terminal, web, OS) given an action and history, useful for training and evaluating AI agents without needing real environments.

How does it compare to general-purpose LLMs for environment simulation?

It is a native world model trained specifically for next-state prediction across 7 domains, outperforming frontier models like GPT-5.4 and Claude Opus 4.8 on AgentWorldBench overall.

What are the input/output formats?

Input is a chat completion message with a system prompt describing the domain and a user message containing the action. Output is the predicted environment state (text observation).

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, setting model name to "unsloth/Qwen-AgentWorld-35B-A3B-GGUF" and sending messages in the standard chat format.

What is the context length and recommended output length?

Context length is 262,144 tokens. Recommended output length is 32,768 tokens for most queries.

not yet live

We're benchmarking and onboarding Qwen AgentWorld 35B A3B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo