LatentSkill: Stop Pasting Skills into the Prompt, Bake Them into Weights
Right now, agent skills mostly means text, markdown files you stuff into the context window so the model knows how to do a thing. It works, but every skill costs you tokens on every step, and the more skills you load the more prompt you burn. LatentSkill, from a team led by Aofan Yu, asks the obvious next question: what if the skill lived in the weights instead of the prompt?
The trick is a pretrained hypernetwork that turns a textual skill into a plug-and-play LoRA adapter. The skill knowledge moves from context space into weight space, no more per-step skill tokens, while you keep the things that made text skills nice: you can still load them modularly, scale them, and compose them. On ALFWorld and Search-QA it beats the in-context skill baseline, improving ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits, with 64.1% fewer prefill tokens.
The detail that hints at something bigger: the generated skill LoRAs form a structured semantic geometry. You can dial a skill up or down with the LoRA scaling coefficient, and compose skills through plain parameter-space arithmetic when their components align. Skills start behaving like vectors you can add and subtract, not paragraphs you paste.
This is the same direction a lot of the field is quietly moving, skills graduating from prompt to weights, the boundary between in-context and fine-tuned getting blurry. If skills become composable LoRAs, the agent's capability set turns into something you assemble and ship, not something you re-explain every turn. Link: https://arxiv.org/abs/2606.06087
← Back to all articles
The trick is a pretrained hypernetwork that turns a textual skill into a plug-and-play LoRA adapter. The skill knowledge moves from context space into weight space, no more per-step skill tokens, while you keep the things that made text skills nice: you can still load them modularly, scale them, and compose them. On ALFWorld and Search-QA it beats the in-context skill baseline, improving ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits, with 64.1% fewer prefill tokens.
The detail that hints at something bigger: the generated skill LoRAs form a structured semantic geometry. You can dial a skill up or down with the LoRA scaling coefficient, and compose skills through plain parameter-space arithmetic when their components align. Skills start behaving like vectors you can add and subtract, not paragraphs you paste.
This is the same direction a lot of the field is quietly moving, skills graduating from prompt to weights, the boundary between in-context and fine-tuned getting blurry. If skills become composable LoRAs, the agent's capability set turns into something you assemble and ship, not something you re-explain every turn. Link: https://arxiv.org/abs/2606.06087
Comments