May 23, 2026Research RL Agents

ACC recycles your agent logs into long-context training gold

Here is a quietly clever paper. Every time you run an agent, search, coding, database, whatever, it spits out an enormous multi-turn log full of tool calls and observations. Standard training throws most of that away, because it masks the tool responses and only learns which tool to pick. ACC, Compiling Agent Trajectories for Long-Context Training, says stop wasting it. Take those trajectories, stitch the original question together with all the tool responses and observations across turns, and turn them into long-context QA pairs you can train on directly.

The numbers make the case. A Qwen3-30B-A3B trained this way jumped 18.1 points on the MRCR long-context benchmark and 7.6 on GraphWalks, landing it level with Qwen3-235B-A22B, a model roughly eight times its size, while holding steady on general benchmarks like GPQA, MMLU-Pro and AIME. The analysis even shows the model restructuring its attention to adapt to the task. No hand-curated long documents required.

What I like here is the economics. Long-context training data is expensive and annoying to curate, that is the usual story. But if you are already running agents in production, you are generating exactly this data for free as exhaust, and ACC says it is the best long-context training set you have lying around. The agent era does not just consume models, it produces the fuel to train the next ones. Paper at arxiv.org/abs/2605.21850.

← Previous

π-Bench asks whether your agent can read the room

Super User Daily: 2026-05-23

← Back to all articles

ACC recycles your agent logs into long-context training gold

Related Articles

Comments