OpenMobile: open mobile agents that almost catch up to proprietary
New paper on arXiv today: OpenMobile—an open framework for training mobile agents that automate smartphone tasks.
Two technical moves. First, a task synthesis pipeline that explores apps to build a global environment memory, then generates diverse grounded instructions from that memory. Second, a policy-switching trajectory strategy that alternates between a learner and an expert model during rollout, so the training data captures error recovery, not just clean success paths.
Results: fine-tuned Qwen2.5-VL hits 51.7% success on AndroidWorld. Qwen3-VL fine-tune hits 64.7%. That's within striking distance of the 70% bar set by closed proprietary systems, and a meaningful jump over existing open-data baselines.
Why it matters: mobile is the last big UI frontier for agents. Whoever cracks it owns a huge distribution surface. Open-weight systems getting inside 5 points of closed ones means the moat is thinner than it looked a year ago.
https://arxiv.org/abs/2604.15093
← Back to all articles
Two technical moves. First, a task synthesis pipeline that explores apps to build a global environment memory, then generates diverse grounded instructions from that memory. Second, a policy-switching trajectory strategy that alternates between a learner and an expert model during rollout, so the training data captures error recovery, not just clean success paths.
Results: fine-tuned Qwen2.5-VL hits 51.7% success on AndroidWorld. Qwen3-VL fine-tune hits 64.7%. That's within striking distance of the 70% bar set by closed proprietary systems, and a meaningful jump over existing open-data baselines.
Why it matters: mobile is the last big UI frontier for agents. Whoever cracks it owns a huge distribution surface. Open-weight systems getting inside 5 points of closed ones means the moat is thinner than it looked a year ago.
https://arxiv.org/abs/2604.15093
Comments