MCPHunt Catches MCP Leaking Credentials Without a Bad Actor
Most MCP threat models assume an attacker. MCPHunt skips that and shows credentials leak across MCP servers anyway, just from how the agent composes calls. 4 authors out of Tsinghua, posted to arXiv April 30. The first benchmark that isolates 'non-adversarial, verbatim credential propagation' across multi-server MCP trust boundaries.
Numbers: 5 frontier models, 147 tasks across 9 mechanism families, 3,615 traces. Policy-violating credential propagation in 11.5 to 41.3 percent of cases β across all five models. Variance across pathways is 25x, and the leak concentrates in browser-mediated data flows. Hard-negative controls confirm production-format credentials aren't even required. The model just has to read a token from one server and the agent's own prompt-directed flow carries it across to another.
This is the Cursor-deletes-production-DB shape but for credentials instead of code. Every 'safe' read permission and every 'safe' write permission is individually safe. The composition isn't. Permit.io's MCP Gateway, Charm Security, ZeroPath, Astrix all sell on the assumption that MCP server boundaries hold. MCPHunt's data says they hold against malicious actors but not against a sufficiently capable agent that thinks credential A from server X belongs in tool call to server Y.
The missing capability in 2026 agent stacks is taint tracking β knowing where every value in context came from and which downstream calls it's allowed to flow into. MCPHunt is the first benchmark concrete enough to drive that work. Expect the agent security cluster to start citing the 11.5-41.3 percent number the way SWE-bench Verified got cited.
Paper: https://arxiv.org/abs/2604.27819
← Back to all articles
Numbers: 5 frontier models, 147 tasks across 9 mechanism families, 3,615 traces. Policy-violating credential propagation in 11.5 to 41.3 percent of cases β across all five models. Variance across pathways is 25x, and the leak concentrates in browser-mediated data flows. Hard-negative controls confirm production-format credentials aren't even required. The model just has to read a token from one server and the agent's own prompt-directed flow carries it across to another.
This is the Cursor-deletes-production-DB shape but for credentials instead of code. Every 'safe' read permission and every 'safe' write permission is individually safe. The composition isn't. Permit.io's MCP Gateway, Charm Security, ZeroPath, Astrix all sell on the assumption that MCP server boundaries hold. MCPHunt's data says they hold against malicious actors but not against a sufficiently capable agent that thinks credential A from server X belongs in tool call to server Y.
The missing capability in 2026 agent stacks is taint tracking β knowing where every value in context came from and which downstream calls it's allowed to flow into. MCPHunt is the first benchmark concrete enough to drive that work. Expect the agent security cluster to start citing the 11.5-41.3 percent number the way SWE-bench Verified got cited.
Paper: https://arxiv.org/abs/2604.27819
Comments