May 8, 2026Research RL Skills Open Source

Skill1 Trains One Policy to Pick, Use, and Distill Agent Skills

Skill1 hit HuggingFace Daily Papers #1 today with 53 upvotes. USTC + Meituan team. The contribution: train one RL policy that simultaneously learns to select skills from a library, use them in an environment, and distill new ones from successful trajectories. All from a single task-outcome reward.

97.5% average success rate on ALFWorld, beating RetroAgent at 94.9%. 89.7 score on WebShop, 82.9% success rate, also state-of-the-art. The trick is signal decomposition: the same outcome reward gets split into a low-frequency exponential moving average for skill selection (which skills tend to work?) and a high-frequency variation for distillation (did this new skill exceed the library baseline?).

Code is at github.com/AlphaLab-USTC/Skill1. Trained Qwen2.5-7B with GRPO. ~30 hours on 8 H800s. Library capped at 5,000 skills with utility-weighted retirement. Ablations show removing the library tanks performance by 16.6 points — the library isn't a nice-to-have, it's the architecture.

This stacks with SkillOS (paper #7 today, frozen executor + trainable curator) and the May 3 Skills-Coach release into a real cluster. Three weeks ago a "skill curation paper" felt like a fork from agent-skills. Now it's a research subfield with three independent answers and Anthropic's Skills as the production ancestor.

The structural read: skills are no longer markdown files in a repo. Skills are the unit of agent learning. The model doesn't memorize tasks anymore — it builds a library of reusable strategies and a meta-policy for picking them. Once a benchmark like τ-bench or AgentBench publishes a "skill-curation track," the category opens for productization. Source: https://arxiv.org/abs/2605.06130

← Previous

DCI-Agent Says Drop the Vector DB and Let Agents Use Grep

re_gent Wants to Be Git for Coding Agents

← Back to all articles

Skill1 Trains One Policy to Pick, Use, and Distill Agent Skills

More Articles

Comments