May 9, 2026Research Skills RL

SkillOS: A Skill Curator That Trains Itself

Google Cloud AI Research, MIT, UIUC. New paper on arXiv, 29 upvotes climbing on HuggingFace daily. SkillOS does one thing the skills hype train kept dancing around — it learns how to manage skills via reinforcement learning instead of hand-curating them.

Two agents. The executor stays frozen. A separate, trainable skill curator watches what the executor does, then decides what to insert, update, or delete in an external skill repo. The curator gets four reward signals: did the task succeed downstream, were the function calls valid, was the skill actually useful, and did the repo stay compact. Hierarchical GRPO ties the loop together.

Numbers. ALFWorld success rate goes from 55.7% to 61.2%, with 6% fewer interaction steps. WebShop 35.7 to 40.6. Reasoning tasks (AIME24/25, GPQA) average 73.8% versus 69.1%. The wild result: an 8B trained curator beats Gemini-2.5-Pro at zero-shot curation. Curator training transfers across executors — Qwen3-8B, Qwen3-32B, Gemini-2.5-Pro all benefit from the same curator.

What's structurally interesting is what the repo evolves into. The curator starts by inserting task-specific skills. Over training it shifts toward editing existing skills and producing higher-level meta-strategies. That's the part you can't get from hand-written skill libraries — emergence of abstraction at the curator level.

Pairs perfectly with Skill1 (USTC, last week, 60 upvotes today), addyosmani agent-skills (37K stars), and Anthropic's Skills protocol. Six independent skill-curation projects in 60 days. The category that didn't exist in March is structurally complete now. arxiv.org/abs/2605.06614.

← Previous

Nvidia Has Already Spent $40B Buying AI Equity This Year

Microsoft Just Showed Frontier LLMs Quietly Wreck 25% of Your Documents

← Back to all articles

SkillOS: A Skill Curator That Trains Itself

More Articles

Comments