Tool Attention Is All You Need just killed 95% of MCP token waste
New arXiv paper (2604.21816) with a cheeky title that actually earns it. The setup: LLM agents running with multi-server MCP spend 10 to 60K tokens per turn just on tool schemas, and context utility degrades fast around 70 percent fill. Bench them on 120 tools across 6 servers and the agent is burning 47.3K tokens per turn to do almost nothing.
Three fixes stacked together. ISO score ranks tool relevance rather than loading everything. State-aware gating decides which tools to expose based on where the agent is in the plan. Lazy schema loader only pulls the full spec when the agent is about to call. Same benchmark, same 120 tools, 2.4K tokens per turn. 95 percent reduction. Context utility goes from 24 percent to 91 percent.
This is the kind of paper that looks academic and is actually deeply shippable. Anthropic, OpenAI, and every MCP gateway vendor are quietly solving exactly this problem because agents-with-a-hundred-tools is where production deployments break. The paper gives a clean recipe, expect the technique to land in claude-code, Codex, Cursor MCP routers within a quarter.
The bigger pattern: MCP was designed for discovery, not efficiency. The industry is now retrofitting efficiency. If you're building an MCP server or an agent framework, read this one.
Link: https://arxiv.org/abs/2604.21816
← Back to all articles
Three fixes stacked together. ISO score ranks tool relevance rather than loading everything. State-aware gating decides which tools to expose based on where the agent is in the plan. Lazy schema loader only pulls the full spec when the agent is about to call. Same benchmark, same 120 tools, 2.4K tokens per turn. 95 percent reduction. Context utility goes from 24 percent to 91 percent.
This is the kind of paper that looks academic and is actually deeply shippable. Anthropic, OpenAI, and every MCP gateway vendor are quietly solving exactly this problem because agents-with-a-hundred-tools is where production deployments break. The paper gives a clean recipe, expect the technique to land in claude-code, Codex, Cursor MCP routers within a quarter.
The bigger pattern: MCP was designed for discovery, not efficiency. The industry is now retrofitting efficiency. If you're building an MCP server or an agent framework, read this one.
Link: https://arxiv.org/abs/2604.21816
Comments