May 12, 2026ResearchInfrastructureFramework

Stanford Just Made Agents 5x Faster Than Docker by Treating Execution Trace as Source of Truth.

Shepherd dropped on arXiv yesterday. Seven authors out of Stanford NLP including Christopher Manning and Weiyan Shi, plus Derek Chong, Ananjan Nandi, Dilara Soylu, Jiuding Sun, and Simon Yu as lead. 56 pages, 21 figures, 14 tables. The pitch — meta-agents (agents that orchestrate other agents) keep being bottlenecked by the runtime substrate. Docker is too slow, prompt caches are non-reusable across forks, replay is hand-rolled, and intervention is brittle. Shepherd is the rewrite from first principles.

The core idea is a functional programming model where every meta-agent operation is a typed event in a Git-like execution trace. State is immutable and forkable. Caches are reused across branches by content addressing. Replay is just walking the trace. The numbers — 5x faster than Docker on the same workloads, over 95% prompt-cache reuse across forks. That is the substrate result that makes everything downstream possible.

Three downstream applications drop out of the substrate. First, runtime intervention via supervisor agents — taking pair coding pass rates from 28.8% to 54.7% on CooperBench, a +25.9 point lift. Second, counterfactual meta-optimization via branching exploration — up to 11 point benchmark gains while reducing total execution time by 58%, because you do not re-execute the prefix every time. Third, Tree-RL training using selective forking — TerminalBench-2 climbs from 34.2% to 39.4% with the same compute budget as the linear baseline.

The structural argument is bigger than the speedup. Most current agent frameworks treat the runtime as the orchestration layer's problem — schedule tasks, retry on failure, log somewhere. Shepherd treats the execution trace as the source of truth. Every state is a node in a Git-like tree. That changes what is even possible — you can audit any decision by walking back, you can branch to test alternative strategies without spawning fresh containers, you can train on the tree as a structured object instead of flattening it to linear rollouts.

The paper says they are open-sourcing the system. The Manning byline plus the 56-page treatment plus the +25.9pp CooperBench number plus the Docker-5x claim plus the Tree-RL training piece is the kind of bundle that gets cited as a primitive within months. If the open-source release lands clean, Shepherd becomes the substrate other agent frameworks compile down to. arxiv.org/abs/2605.10913.
← Previous
DeepMind Wants to Kill the Mouse Pointer. The Replacement Is an AI Agent That Reads Your Screen.
Next →
Cactus Compute Distilled Gemini Tool Calling Into a 26 Million Parameter Model. It Runs on a Watch.
← Back to all articles

Comments

Loading...
>_