April 30, 2026ResearchBenchmarkAgents

ClawGym: 13.5K verified tasks, a 200-instance bench, and someone finally did the agent SFT pipeline properly

ClawGym landed on HuggingFace daily papers today at 32 upvotes, with 13 authors led by Fei Bai and Wayne Xin Zhao. The framework is the kind of agent-pretraining infrastructure that should have shipped six months ago — a unified pipeline for synthesizing verifiable training data, fine-tuning agents on it, and benchmarking what comes out.

The naming is on the nose — claw-style environments are explicitly the multi-step file/tool/workspace pattern that Claude Code, Codex, and Devin operate in. ClawGym ships three pieces. ClawGym-SynData is 13,500 filtered tasks built around realistic workspaces. ClawGym-Agents are models trained via SFT plus RL on that data. ClawGym-Bench is a 200-instance evaluation set sized for fast iteration without LLM-judge contamination.

The argument the paper makes implicitly is the one Hunyuan's SkillSynth made explicitly yesterday. The bottleneck for the next generation of coding agents is data. You can't scrape it, you can't crowdsource it, you can't synthesize it without verification. So you build a pipeline that generates tasks with ground-truth solutions, you train on the pipeline output, and you benchmark on a held-out slice of the same pipeline. SkillSynth, ClawGym, TCOD from yesterday — three papers in 72 hours, all attacking the same agent-pretraining-data wall from different angles.

The interesting tell is the github.com/ClawGym org name. Whoever set this up made the implicit branding bet that this is the framework other groups will train against. If the 13.5K dataset and 200-instance bench actually hold up under scrutiny, ClawGym becomes the SWE-Bench of agent SFT — not the leaderboard, the data pipeline. Worth watching for who forks first. Paper: https://arxiv.org/abs/2604.26904 Repo: https://github.com/ClawGym
← Previous
Kuaishou's Bian Que: agentic ops at production scale, 75% fewer alerts
← Back to all articles

Comments

Loading...
>_