AgentTrust — Sixth Architecture for the Cursor-Deletes-Prod Problem
Three weeks, six different architectural answers to the same question — how do you stop an agent from doing something dumb at runtime? AgentTrust just dropped on arXiv with the cleanest "verdict at the syscall layer" version yet.
Single-author paper from Chenglin Yang. Releases AGPL-3.0 code plus an MCP server. The pitch: every tool call goes through a runtime safety layer that returns one of four verdicts — allow, warn, block, or review. Shell deobfuscation normalizer catches base64-encoded payloads. SafeFix suggests less destructive alternatives. RiskChain detects multi-step attack sequences. Ambiguous cases hit a cache-aware LLM-as-Judge.
Numbers: 95.0% verdict accuracy on 300 internal scenarios, 96.7% on 630 external adversarial cases, ~93% on shell-obfuscated payloads. Latency low-millisecond. Not a sidecar, not a sandbox — a verdict service.
Sixth answer in twenty-one days to one structural problem. Mendral did the architecture layer (April 28). Rosentic did CI for agents (May 4). Mindra did consumer UX (May 5). Intuned did production browser (May 5). Tilde.run did transactional storage (May 7 morning). AgentTrust does syscall-level interception. Different layer, same problem. The Cursor-deletes-prod-DB story turned out to be the most generative single incident of the year — six startups and papers built around defending against it.
Source: https://arxiv.org/abs/2605.04785
← Back to all articles
Single-author paper from Chenglin Yang. Releases AGPL-3.0 code plus an MCP server. The pitch: every tool call goes through a runtime safety layer that returns one of four verdicts — allow, warn, block, or review. Shell deobfuscation normalizer catches base64-encoded payloads. SafeFix suggests less destructive alternatives. RiskChain detects multi-step attack sequences. Ambiguous cases hit a cache-aware LLM-as-Judge.
Numbers: 95.0% verdict accuracy on 300 internal scenarios, 96.7% on 630 external adversarial cases, ~93% on shell-obfuscated payloads. Latency low-millisecond. Not a sidecar, not a sandbox — a verdict service.
Sixth answer in twenty-one days to one structural problem. Mendral did the architecture layer (April 28). Rosentic did CI for agents (May 4). Mindra did consumer UX (May 5). Intuned did production browser (May 5). Tilde.run did transactional storage (May 7 morning). AgentTrust does syscall-level interception. Different layer, same problem. The Cursor-deletes-prod-DB story turned out to be the most generative single incident of the year — six startups and papers built around defending against it.
Source: https://arxiv.org/abs/2605.04785
Comments