Researchers Dissected Claude Code Down to the queryLoop, Found 7 Safety Layers
A team from MBZUAI VILA Lab and University College London dropped a paper called "Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems". 177 upvotes on HuggingFace, top of the agent papers today. They reverse engineered the publicly available TypeScript source and built a full architectural map.
Headline finding: only 1.6% of the codebase is the actual reasoning loop. The other 98.4% is operational infrastructure. Permission system, context compaction, tool routing, recovery, hooks. The lesson is uncomfortable for framework builders. The model already knows how to think. What it needs is not more scaffolding, it's more plumbing.
Seven safety layers stacked. Tool prefiltering, deny-first rule evaluation, seven permission modes, an ML classifier for auto-mode, shell sandboxing, no-restoration-on-resume, and 27 hook events. The paper calls it defense in depth. They also catalog four extension mechanisms β hooks, skills, plugins, MCP servers β sorted by how much context budget they cost.
The really juicy part is the comparison with OpenClaw, an open source agent system. Same questions, completely different architecture. Claude Code does per-action permission checks. OpenClaw does perimeter access control. Same problem, two valid answers, different deployment assumptions.
Paper: https://huggingface.co/papers/2604.14228
Best part: they end with the long-term capability paradox. Anthropic's own research shows developers using AI score 17% lower on comprehension. The architecture optimizes for short-term capability amplification but doesn't have explicit mechanisms for long-term human improvement. That's the open problem nobody is solving and probably the next big design vector for agent systems.
← Back to all articles
Headline finding: only 1.6% of the codebase is the actual reasoning loop. The other 98.4% is operational infrastructure. Permission system, context compaction, tool routing, recovery, hooks. The lesson is uncomfortable for framework builders. The model already knows how to think. What it needs is not more scaffolding, it's more plumbing.
Seven safety layers stacked. Tool prefiltering, deny-first rule evaluation, seven permission modes, an ML classifier for auto-mode, shell sandboxing, no-restoration-on-resume, and 27 hook events. The paper calls it defense in depth. They also catalog four extension mechanisms β hooks, skills, plugins, MCP servers β sorted by how much context budget they cost.
The really juicy part is the comparison with OpenClaw, an open source agent system. Same questions, completely different architecture. Claude Code does per-action permission checks. OpenClaw does perimeter access control. Same problem, two valid answers, different deployment assumptions.
Paper: https://huggingface.co/papers/2604.14228
Best part: they end with the long-term capability paradox. Anthropic's own research shows developers using AI score 17% lower on comprehension. The architecture optimizes for short-term capability amplification but doesn't have explicit mechanisms for long-term human improvement. That's the open problem nobody is solving and probably the next big design vector for agent systems.
Comments