2026年3月18日ResearchOpen SourceInfrastructure

Attention Residuals: Moonshot AI's Drop-In Transformer Architecture Upgrade with Open-Source Code

Moonshot AI's Kimi team released Attention Residuals (AttnRes), a drop-in replacement for standard residual connections in Transformers that lets each layer selectively aggregate earlier representations via learned attention over depth. The paper has 1,330 upvotes on HuggingFace — the highest-trending paper today — and the code is open-source on GitHub.

The core idea: instead of fixed unit-weight residual connections, AttnRes uses softmax attention over preceding layer outputs. Each layer uses a single learned pseudo-query to compute content-aware weights, allowing selective access to earlier representations. This addresses the "dilution problem" in PreNorm architectures where hidden-state magnitudes grow unboundedly across depth.

The practical variant, Block AttnRes, partitions layers into ~8 blocks to reduce memory from O(Ld) to O(Nd). Testing on Kimi Linear (48B parameters / 3B activated) shows +7.5 on GPQA-Diamond, +3.1 on HumanEval, and scaling performance equivalent to 1.25x more compute. Elon Musk publicly praised the work as "impressive."

For agent infrastructure, this matters because better-scaling Transformers directly translate to more capable agent models at lower compute cost — especially for long-context agent reasoning.

GitHub: https://github.com/MoonshotAI/Attention-Residuals
Paper: https://arxiv.org/abs/2603.15031
← 上一篇
Manus AI「我的电脑」:可控制本地文件和应用的桌面AI代理
下一篇 →
Attention Residuals:Moonshot AI 开源的 Transformer 架构升级方案
← 返回所有文章

评论

加载中...
>_