Microsoft Built 1,000 Computers for Agents to Live In
Microsoft Research dropped a paper April 30 that quietly answers the agent-pretraining-data question by ignoring it. Don't collect human data. Generate synthetic computers — 1,000 of them — and let agents live in each one for 8+ hours, 2,000+ turns per run. Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao on the byline. Same lab that wrote half of GPT-4's papers.
Each synthetic computer is a populated workspace: real-looking folder structures, actual content in spreadsheets, drafts in Word, half-finished slide decks, plausible email backlogs. Agents come in and do productivity work. Multi-agent simulation produces both the task and the execution trace. The trace becomes training data. They claim improvements both in-domain and on transfer evals, and the methodology scales 'to millions or billions of synthetic user worlds.'
This is the third agent-pretraining-data paper in two weeks. ClawGym did the SFT pipeline angle. TCOD diagnosed the trajectory-level KL instability. SkillSynth from Hunyuan synthesized skills via skill graphs. Microsoft Research is now in the same conversation with the most ambitious frame of all — manufacture entire user environments rather than just tasks. The sheer scale (8 hours per simulation, 2,000+ turns) is what makes this different from synthetic data work in 2024.
The interesting question is whether 'live in 1,000 simulated computers' produces more transferable agent skill than 'live on one real computer with a real user.' The user-data school (Anthropic, OpenAI, Cognition) bets the latter. The synthetic-environment school (Microsoft, ByteDance, Hunyuan) bets the former. Within six months one of them will have a benchmarkable answer. Either way, agent pretraining is no longer a research curiosity — it's the load-bearing problem the next generation of coding and computer-use agents will be evaluated against.
Paper: https://arxiv.org/abs/2604.28181
← Back to all articles
Each synthetic computer is a populated workspace: real-looking folder structures, actual content in spreadsheets, drafts in Word, half-finished slide decks, plausible email backlogs. Agents come in and do productivity work. Multi-agent simulation produces both the task and the execution trace. The trace becomes training data. They claim improvements both in-domain and on transfer evals, and the methodology scales 'to millions or billions of synthetic user worlds.'
This is the third agent-pretraining-data paper in two weeks. ClawGym did the SFT pipeline angle. TCOD diagnosed the trajectory-level KL instability. SkillSynth from Hunyuan synthesized skills via skill graphs. Microsoft Research is now in the same conversation with the most ambitious frame of all — manufacture entire user environments rather than just tasks. The sheer scale (8 hours per simulation, 2,000+ turns) is what makes this different from synthetic data work in 2024.
The interesting question is whether 'live in 1,000 simulated computers' produces more transferable agent skill than 'live on one real computer with a real user.' The user-data school (Anthropic, OpenAI, Cognition) bets the latter. The synthetic-environment school (Microsoft, ByteDance, Hunyuan) bets the former. Within six months one of them will have a benchmarkable answer. Either way, agent pretraining is no longer a research curiosity — it's the load-bearing problem the next generation of coding and computer-use agents will be evaluated against.
Paper: https://arxiv.org/abs/2604.28181
Comments