Map-then-Act Drags Frontier Models From Zero to 88% on ARC-AGI-3
Out of USTC and Meituan today, a paper called Map-then-Act that finally says the obvious thing out loud: current agents try to plan and explore the world at the same time, and that is why they fail.
MAP runs three stages instead. First Global Exploration to build general priors about the environment. Then Task-Specific Mapping to draw a structured cognitive map. Then Knowledge-Augmented Execution that finally goes after the actual task with the map in hand. The hot-take is that environment understanding has to happen before action, not during it. They call the failure mode of every current agent Delayed Environmental Perception.
The numbers are loud. On ARC-AGI-3, where frontier models score near zero baseline, MAP enables them to surpass baseline performance in 22 out of 25 game environments. They also released MAP-2K, a dataset of map-then-act trajectories, and showed that training on it outperforms training on expert execution traces. Translation: understanding the world is more fundamental than imitating someone who already does.
If this generalizes outside ARC, every long-horizon agent product on the market should be rebuilding its inner loop. Cursor agent mode, Claude Code on multi-day tasks, OpenAI Operator browsing - none of them currently map before acting. The arXiv ID is 2605.13037.
https://arxiv.org/abs/2605.13037
← Back to all articles
MAP runs three stages instead. First Global Exploration to build general priors about the environment. Then Task-Specific Mapping to draw a structured cognitive map. Then Knowledge-Augmented Execution that finally goes after the actual task with the map in hand. The hot-take is that environment understanding has to happen before action, not during it. They call the failure mode of every current agent Delayed Environmental Perception.
The numbers are loud. On ARC-AGI-3, where frontier models score near zero baseline, MAP enables them to surpass baseline performance in 22 out of 25 game environments. They also released MAP-2K, a dataset of map-then-act trajectories, and showed that training on it outperforms training on expert execution traces. Translation: understanding the world is more fundamental than imitating someone who already does.
If this generalizes outside ARC, every long-horizon agent product on the market should be rebuilding its inner loop. Cursor agent mode, Claude Code on multi-day tasks, OpenAI Operator browsing - none of them currently map before acting. The arXiv ID is 2605.13037.
https://arxiv.org/abs/2605.13037
Comments