May 20, 2026ResearchSkillsAgents

Library Drift: Your Agent's Skill Library Is Quietly Rotting

Everyone is bolting skill libraries onto agents and letting the agent write its own skills, on the theory that an agent that learns gets better over time. This paper drops a cold number on that theory. On SkillsBench, skills authored by the LLM itself delivered plus zero point zero percentage points. Human-curated skills delivered plus sixteen point two. Letting the agent fill its own library, on its own, did literally nothing.

The authors call the failure mode library drift. Skills pile up with no lifecycle management, retrieval quality quietly degrades, and the router starts firing false-positive skills into prompts that do not need them. The nasty part is that it is silent. You do not see it in your end-task scores until the rot is already deep.

The fix is unglamorous and that is the point. They keep an append-only evidence log tracking each skill's contribution score, attribution verdicts, and how often the router actually engages it, so you can spot decay before it shows up downstream. Then three governance moves: retire skills by outcome, cap how many stay active at once, and prioritize authoring meta-skills over one-off ones. On MBPP+ hard-100 over a hundred rounds, pass@1 climbed from 0.258 to 0.584.

This is the half of the skills story nobody demos. Everyone ships the clip where the agent learns a new trick. Almost nobody ships skill garbage collection, and this paper argues the garbage collection is where the real performance lives. The moat is not learning skills, it is governing them. https://arxiv.org/abs/2605.19576
← Previous
Emdash Wants to Be the Cockpit for All 28 Coding Agents
Next β†’
Super User Daily: 2026-05-19
← Back to all articles

Comments

Loading...
>_