June 30, 2026ResearchAgentsBenchmark

The Hardest Agent Skill Is Knowing When to Quit

A new paper from the University of Washington asks a question that sounds trivial and turns out to be brutal: do agents know when to stop instead of act? The team behind it, Han Luo, Bingbing Wen and Lucy Lu Wang, ran 13 LLM-as-agent systems across web shopping, terminal tasks and question answering, more than 28,000 tasks in total, and found that timely abstention is where agents fall apart. Some never stop when they should. Others stop only after a long string of pointless interactions.

The failure is worst exactly where it costs the most. On tasks that look feasible at first but where the environment quietly reveals there is no valid result, the agent keeps grinding, keeps calling tools, keeps burning tokens on something that was never going to work. And here is the part that should bother people building bigger models: scale and reasoning ability did not reliably help. Larger, smarter models were sometimes worse at knowing when to quit, because more capability means more confidence that one more action will crack it.

The fix the authors propose is pointedly not a weights update. CONVOLVE is a context engineering method that distills full interaction trajectories into reusable stopping rules, then hands those rules back to the agent. On WebShop it lifted Llama-3.3-70B's timely recall from 26.7 percent to 57.4 percent without touching a single parameter. That puts it on the files-not-weights side of the running argument about where agent skills should live, alongside the spec-driven and skill-distillation work.

The deeper point lands against a quarter of benchmarks showing agents fail most real jobs. The expensive failure mode is not the wrong answer, it is the agent that will not stop. Knowing when to give up is a skill, it is learnable, and almost nobody is training for it. The paper is arXiv 2606.28733, at https://arxiv.org/abs/2606.28733
← Previous
Herdr Is tmux for Your Agent Swarm
Next β†’
Super User Daily: July 1, 2026
← Back to all articles

Comments

Loading...
>_