April 4, 2026Research RL Skills

SKILL0: Teaching Agents to Forget Their Training Wheels

Every agent framework right now works the same way: give the agent skills at runtime, hope it uses them correctly. The problem is obvious — retrieval noise brings irrelevant guidance, injected skill content bloats the context, and the model never actually learns anything. It just follows instructions it was handed.

SKILL0 from Zhejiang University and Meituan flips this. Instead of feeding skills at inference time, it bakes them into the model's parameters during training. The framework starts with full skill context and progressively removes it, using a dynamic curriculum that evaluates whether each skill file is actually helping the current policy. If it's not helping, it gets dropped. By the end of training, the agent operates zero-shot — no runtime skill retrieval needed.

The results are compelling: +9.7% on ALFWorld and +6.6% on Search-QA over standard RL baselines, while maintaining context under 0.5K tokens per step. For comparison, typical skill-augmented agents use 2-4K tokens per step just for skill context.

This matters because it addresses the fundamental tension between agent capability and agent cost. Skills make agents smarter but also more expensive. If you can internalize the skills into weights, you get the intelligence without the token tax.

62 upvotes on HuggingFace today. Code confirmed at github.com/ZJU-REAL/SkillZero.

https://arxiv.org/abs/2604.02268

← Previous

Codenotary AgentMon: Who's Watching Your AI Agents?

Miravoice Raises $6.3M to Replace Human Phone Interviewers with AI Agents

← Back to all articles

SKILL0: Teaching Agents to Forget Their Training Wheels

Related Articles

Comments