March 18, 2026ResearchOpen SourceInfrastructure

Attention Residuals: Moonshot AI's Drop-In Transformer Architecture Upgrade with Open-Source Code

Moonshot AI's Kimi team released Attention Residuals (AttnRes), a drop-in replacement for standard residual connections in Transformers that lets each layer selectively aggregate earlier representations via learned attention over depth. The paper has 1,330 upvotes on HuggingFace — the highest-trending paper today — and the code is open-source on GitHub.

The core idea: instead of fixed unit-weight residual connections, AttnRes uses softmax attention over preceding layer outputs. Each layer uses a single learned pseudo-query to compute content-aware weights, allowing selective access to earlier representations. This addresses the "dilution problem" in PreNorm architectures where hidden-state magnitudes grow unboundedly across depth.

The practical variant, Block AttnRes, partitions layers into ~8 blocks to reduce memory from O(Ld) to O(Nd). Testing on Kimi Linear (48B parameters / 3B activated) shows +7.5 on GPQA-Diamond, +3.1 on HumanEval, and scaling performance equivalent to 1.25x more compute. Elon Musk publicly praised the work as "impressive."

For agent infrastructure, this matters because better-scaling Transformers directly translate to more capable agent models at lower compute cost — especially for long-context agent reasoning.

GitHub: https://github.com/MoonshotAI/Attention-Residuals
Paper: https://arxiv.org/abs/2603.15031
← Previous
My Computer by Manus AI: Desktop Agent That Controls Your Local Files and Apps
Next →
OpenSeeker: First Fully Open-Source Search Agent with Complete Training Data
← Back to all articles
>_