May 27, 2026Research Agents Infrastructure

A new paper says language models need sleep, literally

This one hit the Hacker News front page this morning and the title carries the whole pitch: Language Models Need Sleep. From Tom Goldstein's group at Maryland and CMU's Giulia Fanti, it's a genuinely different take on the context-length problem.

The idea is biological. A transformer drowns as its context grows, attention scales badly and the KV cache balloons. So instead of carrying everything in working memory forever, the model goes to sleep: it does offline recurrent passes over what it has accumulated and burns that context into persistent fast weights inside state-space blocks, then dumps the KV cache. When it wakes, the information is already in the weights, so inference stays fast and the context is gone but not lost. Consolidation happens during downtime, exactly like the theory of what your brain does overnight.

The finding that makes it real: the longer it sleeps, meaning the more offline passes it runs, the better it does, and the gains are biggest on the hard problems that need deep reasoning. On math tasks where plain transformers and even SSM-attention hybrids failed, the sleeping models got there. Sleep time literally trades for reasoning ability.

Why care if you build agents? This is the agent-memory problem from a new angle. Everyone is bolting on vector stores and memory frameworks to fake long-term recall. This says maybe the model should metabolize its own history into weights instead, remember by changing, not by retrieving. It's early and it's research, but it's the most interesting framing of agent memory I've seen this month.

Paper: arxiv.org/abs/2605.26099

← Previous

QUEST trains a frontier research agent on 8,000 made-up tasks

Super User Daily: May 27, 2026

← Back to all articles

A new paper says language models need sleep, literally

Related Articles

Comments