2026年3月19日Research RL Agents

Online Experiential Learning: Microsoft's Framework for Agents That Improve from Deployment

Microsoft Research has released Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. The paper appeared on HuggingFace Daily Papers with 35 upvotes, and code is available.

OEL works in two stages. First, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected during real-world use. Second, this knowledge is consolidated into model parameters via on-policy context distillation — importantly, requiring no access to the user-side environment.

The results show consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. The key insight: extracted experiential knowledge is significantly more effective than raw trajectories, and on-policy consistency between the knowledge source and the policy model is critical for effective learning.

This addresses a fundamental challenge for deployed agents: how to get better over time without retraining on user data. Current agents are static after deployment — OEL provides a mechanism for agents to learn from what works and what doesn't in production, without compromising user privacy.

Paper: https://arxiv.org/abs/2603.16856
Code: https://aka.ms/oel-code

← 上一篇

Newton 1.0：用于训练机器人代理的开源物理引擎正式发布

在线经验学习：微软发布让代理在部署中持续进化的框架

← 返回所有文章

加载中...

Online Experiential Learning: Microsoft's Framework for Agents That Improve from Deployment

相关文章

评论