ClawGUI Unifies the Entire GUI Agent Lifecycle
Training a GUI agent, evaluating it, and deploying it to real devices have always been three separate headaches. Zhejiang University's REAL Lab just collapsed all three into one framework.
ClawGUI has three modules that actually talk to each other. ClawGUI-RL runs dozens of parallel Android emulators for online reinforcement learning, replacing standard GRPO with a step-level reward system called GiGPO+PRM. ClawGUI-Eval standardizes evaluation across 6 benchmarks and 11+ vision-language models with a 95.8% reproduction rate against official numbers — meaning you can finally trust cross-paper comparisons. ClawGUI-Agent deploys to Android, HarmonyOS, and iOS via natural language commands from 12+ chat platforms.
The proof is in the numbers. ClawGUI-2B, a 2-billion parameter agent trained end-to-end in this framework, hit 17.1 success rate on MobileWorld versus an 11.1 baseline — a 54% jump. For a 2B model. That's the kind of result that makes you wonder what happens when someone trains a 7B or 14B model with this pipeline.
The real insight here isn't any single module. It's that training, eval, and deployment have been artificially separated for too long. When you close the loop — train, measure, deploy, measure again — the whole system gets better faster. ClawGUI is the first framework that actually makes this loop practical.
307 upvotes on HuggingFace Daily Papers. Apache 2.0 license. The code is at https://github.com/ZJU-REAL/ClawGUI
← Back to all articles
ClawGUI has three modules that actually talk to each other. ClawGUI-RL runs dozens of parallel Android emulators for online reinforcement learning, replacing standard GRPO with a step-level reward system called GiGPO+PRM. ClawGUI-Eval standardizes evaluation across 6 benchmarks and 11+ vision-language models with a 95.8% reproduction rate against official numbers — meaning you can finally trust cross-paper comparisons. ClawGUI-Agent deploys to Android, HarmonyOS, and iOS via natural language commands from 12+ chat platforms.
The proof is in the numbers. ClawGUI-2B, a 2-billion parameter agent trained end-to-end in this framework, hit 17.1 success rate on MobileWorld versus an 11.1 baseline — a 54% jump. For a 2B model. That's the kind of result that makes you wonder what happens when someone trains a 7B or 14B model with this pipeline.
The real insight here isn't any single module. It's that training, eval, and deployment have been artificially separated for too long. When you close the loop — train, measure, deploy, measure again — the whole system gets better faster. ClawGUI is the first framework that actually makes this loop practical.
307 upvotes on HuggingFace Daily Papers. Apache 2.0 license. The code is at https://github.com/ZJU-REAL/ClawGUI
Comments