March 24, 2026Infrastructure Research Open Source

EverMind MSA: 100M-Token Long-Term Memory for LLMs via Memory Sparse Attention

EverMind AI has open-sourced MSA (Memory Sparse Attention), an end-to-end trainable sparse memory framework that enables LLMs to scale context to 100 million tokens. The project reached 56 points on Hacker News and is available on GitHub.

MSA introduces four key innovations: Memory Sparse Attention, a differentiable content-based sparsification mechanism that dynamically selects the most relevant memory subsets; Document-wise RoPE, decoupling internal document positions from global memory positions; KV Cache Compression enabling 100M-token inference on just two A800 GPUs; and Memory Interleave, allowing multiple rounds of generative retrieval and context expansion for complex multi-hop reasoning.

On long-context QA and Needle-in-a-Haystack benchmarks, MSA surpasses same-backbone RAG systems, best-of-breed RAG stacks, and leading long-context models. Across the unprecedented 16K to 100M token range, MSA shows less than 9% performance degradation.

For the agentic ecosystem, efficient long-term memory is a critical infrastructure problem. Agents working on multi-day tasks, maintaining user context across sessions, or processing large codebases need memory systems that scale beyond typical context windows. MSA's ability to handle 100M tokens on commodity hardware makes this practical.

GitHub: https://github.com/EverMind-AI/MSA
Homepage: https://evermind.ai/

← Previous

SUSE Rancher Prime Gets AI Agent Ecosystem with MCP at KubeCon Europe 2026

Cekura: YC-Backed Automated QA and Monitoring for Voice and Chat AI Agents

← Back to all articles

EverMind MSA: 100M-Token Long-Term Memory for LLMs via Memory Sparse Attention

Related Articles

Comments