May 16, 2026ResearchOpen SourceAgents

delta-Mem Bolts an 8x8 Online Memory Onto a Frozen LLM

Paper just hit HN at 178 points. arXiv 2605.12357. Title: delta-mem, efficient online memory for large language models. Code already up at github.com/declare-lab/delta-Mem. Authors include Soujanya Poria's group, who shipped some of the better small-model memory work of the last year.

The trick is small and clean. Take a frozen LLM. Do not finetune the backbone. Strap on a tiny online state matrix, only 8x8 in the headline experiments. Update that matrix with delta-rule learning every time new context arrives. At generation time, read out of the state and inject low-rank corrections into the backbone's attention. Result: the model accumulates information across a long session without growing the context window and without touching the underlying weights.

Numbers that matter. 1.10x average over the frozen backbone. 1.15x over the strongest non-delta-mem baseline. The interesting wins are on memory-heavy tasks: 1.31x on MemoryAgentBench, 1.20x on LoCoMo. General-purpose evals stay roughly stable, which is the whole point. You wanted memory. You did not want to lose capability.

Why this lands now. Memory is the bottleneck of long-horizon agents and the open literature has been throwing solutions at it for six months. EvolveMem, MemLens, MemEye, STALE, PREPING, MemPrivacy, the list keeps growing. Most of them either retrain the backbone or build a separate retrieval system. delta-mem sits between: a learned recurrent state riding on top of an unchanged frozen model. If it scales beyond Qwen3-4B/8B and SmolLM3-3B, the path to memory in production agents gets a lot shorter.

https://arxiv.org/abs/2605.12357
← Previous
Kimi WebBridge Turns Your Browser Into Local Hands for Any Agent
Next β†’
Berkeley's AsyncFC Lets the Model Reason Over Futures While Tools Run
← Back to all articles

Comments

Loading...
>_