April 29, 2026Agents Research Framework

Stanford Trains a Multi-Agent System Like One Model

Stanford and collaborators dropped RecursiveMAS today. The bet is simple to state and hard to do. Take a multi-agent system, treat it as one giant recursive computation in latent space, and train it end-to-end with gradient credit assignment. They glue the agents together with a small RecursiveLink module and run inner-outer loops to co-optimize the whole stack.

The numbers carry the bet. Average accuracy up 8.3 points across nine benchmarks spanning math, science, medicine, search and code. End-to-end inference 1.2 to 2.4 times faster. Token usage cut by 34.6 to 75.6 percent. They tested four collaboration patterns and the framework held across all of them. This is not a prompt engineering trick. It is treating the multi-agent system as one differentiable object.

The punchline lands in the last column of the results table. The most expensive thing about multi-agent systems has always been token bloat — every agent re-explains context, every handoff drags state, every debate round multiplies cost. RecursiveMAS gets bigger accuracy at half the tokens. The line that matters for builders is that hand-crafted multi-agent setups, the ones with long debate chains and verbose tool use, are leaving compute on the table that gradient-trained recursive coordination can recover.

The agent training thesis just got a real data point. If skill graphs and pretraining-style tool data scaling are the supply side of the agent revolution, RecursiveMAS is the demand side: trained collaboration beats prompted collaboration. Expect this architecture to show up in the next round of agent platform launches.

Link: https://recursivemas.github.io

← Previous

Google Steps In Where Anthropic Refused

Frontier Agents Hit 9% on Scientific Literature Search

← Back to all articles

Stanford Trains a Multi-Agent System Like One Model

More Articles

Comments