2026年4月24日Research MCP Infrastructure

Tool Attention Is All You Need：MCP浪费的95%token有救了

新arXiv论文2604.21816，标题挺损但撑得起。背景：LLM agent跑多server MCP，每轮光tool schema就要烧10到60K token，context利用率到70%左右就开始明显退化。6个server 120个tool的benchmark上，agent每轮烧47.3K token，啥正事没干。

三个fix叠起来用。ISO score给tool按相关性排序，不是全部加载。State-aware gating根据agent在计划里的位置决定暴露哪些tool。Lazy schema loader只在agent准备调用时才拉完整schema。同一个benchmark，同样120个tool——每轮2.4K token。降95%。Context利用率从24%涨到91%。

这种论文一看是学术的，其实非常可落地。Anthropic、OpenAI、每一家MCP gateway厂商都在闷头解这个问题，因为agent挂一百个tool的时候就是生产部署出问题的地方。论文给了一个清晰的recipe——这套技术一个季度内大概率会进claude-code、Codex、Cursor的MCP router。

更大的pattern：MCP当初是为发现设计的，不是为效率设计的。行业现在在给它补效率。做MCP server或agent framework的，这篇必读。

链接：https://arxiv.org/abs/2604.21816

← 上一篇

Kollab想把团队群变成agent控制室

← 返回所有文章

加载中...

Tool Attention Is All You Need：MCP浪费的95%token有救了

更多文章

评论