Online Experiential Learning: Microsoft's Framework for Agents That Improve from Deployment
Microsoft Research has released Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. The paper appeared on HuggingFace Daily Papers with 35 upvotes, and code is available.
OEL works in two stages. First, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected during real-world use. Second, this knowledge is consolidated into model parameters via on-policy context distillation — importantly, requiring no access to the user-side environment.
The results show consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. The key insight: extracted experiential knowledge is significantly more effective than raw trajectories, and on-policy consistency between the knowledge source and the policy model is critical for effective learning.
This addresses a fundamental challenge for deployed agents: how to get better over time without retraining on user data. Current agents are static after deployment — OEL provides a mechanism for agents to learn from what works and what doesn't in production, without compromising user privacy.
Paper: https://arxiv.org/abs/2603.16856
Code: https://aka.ms/oel-code
← Back to all articles
OEL works in two stages. First, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected during real-world use. Second, this knowledge is consolidated into model parameters via on-policy context distillation — importantly, requiring no access to the user-side environment.
The results show consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. The key insight: extracted experiential knowledge is significantly more effective than raw trajectories, and on-policy consistency between the knowledge source and the policy model is critical for effective learning.
This addresses a fundamental challenge for deployed agents: how to get better over time without retraining on user data. Current agents are static after deployment — OEL provides a mechanism for agents to learn from what works and what doesn't in production, without compromising user privacy.
Paper: https://arxiv.org/abs/2603.16856
Code: https://aka.ms/oel-code