Omni-SimpleMem: An Agent Designed Its Own Memory System, and It's 4x Better
Here's a question that's been bugging the agent community: if agents are smart enough to do research, why can't they design their own architecture?
A team from UNC Chapel Hill decided to test this. They built AutoResearchClaw, a 23-stage autonomous research pipeline, and pointed it at the problem of agent memory. The result is Omni-SimpleMem, a multimodal memory framework that an agent essentially designed for itself. And it works dramatically better than anything humans designed by hand.
The numbers are striking. On LoCoMo, a long conversation memory benchmark, F1 jumps from 0.117 to 0.598. That's a 411% improvement. On Mem-Gallery, which tests multimodal memory, F1 goes from 0.254 to 0.797. That's 214% better. These aren't marginal gains. The autoresearch pipeline found architectural choices that human researchers missed entirely.
The system works on three principles. Selective Ingestion uses entropy-driven filtering for each modality, so the agent only stores what actually matters. Progressive Retrieval combines FAISS and BM25 search with a pyramid token-budget expansion, scaling retrieval effort based on query complexity. Knowledge Graph Augmentation adds multi-hop cross-modal reasoning, letting the agent connect memories across text, image, audio, and video.
The meta-story is arguably more important than the specific architecture. AutoResearchClaw ran approximately 50 experiments across two benchmarks, diagnosing failure modes, proposing architectural modifications, and fixing data pipeline bugs, all without human intervention. It discovered that the initial memory configurations had fundamental bottlenecks that no human had identified. This is autoresearch applied to agent infrastructure itself.
Code at github.com/aiming-lab/SimpleMem. Paper at arxiv.org/abs/2604.01007.
← Back to all articles
A team from UNC Chapel Hill decided to test this. They built AutoResearchClaw, a 23-stage autonomous research pipeline, and pointed it at the problem of agent memory. The result is Omni-SimpleMem, a multimodal memory framework that an agent essentially designed for itself. And it works dramatically better than anything humans designed by hand.
The numbers are striking. On LoCoMo, a long conversation memory benchmark, F1 jumps from 0.117 to 0.598. That's a 411% improvement. On Mem-Gallery, which tests multimodal memory, F1 goes from 0.254 to 0.797. That's 214% better. These aren't marginal gains. The autoresearch pipeline found architectural choices that human researchers missed entirely.
The system works on three principles. Selective Ingestion uses entropy-driven filtering for each modality, so the agent only stores what actually matters. Progressive Retrieval combines FAISS and BM25 search with a pyramid token-budget expansion, scaling retrieval effort based on query complexity. Knowledge Graph Augmentation adds multi-hop cross-modal reasoning, letting the agent connect memories across text, image, audio, and video.
The meta-story is arguably more important than the specific architecture. AutoResearchClaw ran approximately 50 experiments across two benchmarks, diagnosing failure modes, proposing architectural modifications, and fixing data pipeline bugs, all without human intervention. It discovered that the initial memory configurations had fundamental bottlenecks that no human had identified. This is autoresearch applied to agent infrastructure itself.
Code at github.com/aiming-lab/SimpleMem. Paper at arxiv.org/abs/2604.01007.
Comments