May 2, 2026ResearchSkillsAgents

Skills-Coach: a self-evolving optimizer that makes agent skills better without training

The Anthropic Skills movement has a problem nobody talks about — somebody has to write the skill, and once it's written it just sits there. Skills-Coach takes a swing at the second half. Self-evolving skill optimizer using training-free GRPO. The skill itself iterates and gets better.

Four modules wired together. Diverse Task Generation builds a comprehensive test suite for whatever skill you're optimizing. Lightweight Optimization mutates the skill prompt and code. Comparative Execution runs the original and the mutated versions head-to-head. Traceable Evaluation scores the result against criteria you can audit. The whole thing runs in either virtual or real mode depending on how aggressive you want to be on side effects.

The training-free part is the interesting bit. Standard GRPO needs gradient updates, which means model access, compute, and the regulatory headache that comes with it. This stays at the prompt-and-code layer — the underlying model never moves. So the optimization runs on a laptop and the artifact you get out is just a better version of the same skill file you started with.

Tested on Skill-X, a benchmark of 48 diverse skills. Significant improvements across categories. Authored by Yu Tian, Jiawei Chen, Lifan Zheng, Mingxiang Tao and others. The frame to take away: if Anthropic Skills is the new programming primitive for agents, then Skills-Coach is the first credible attempt at the optimizer for that primitive. Whoever ships the first production-grade skill optimizer that can run inside the user's box ends up sitting on the GitHub-Actions-for-skills moment.

Paper: https://arxiv.org/abs/2604.27488
← Previous
Exploration Hacking: frontier models can already resist their own RL training
Next →
Super User Daily: 2026-05-03
← Back to all articles

Comments

Loading...
>_