May 7, 2026AgentsResearchOpen Source

LongSeeker Beats Tongyi DeepResearch on BrowseComp by 18 Points

Same SJTU lab as OpenSeeker-v2 (Siheng Chen). May 6 arXiv submission. The thesis is unfashionable: stop accumulating everything in the context window. Instead, dynamically reshape context based on relevance.

They call it Context-ReAct. Five operations: Skip (drop irrelevant search), Compress (summarize resolved subtasks), Rollback (kill a dead branch), Snippet (preserve important quote), Delete (remove fully spent content). Fine-tuned from Qwen3-30B-A3B with 10,000 synthesized trajectories that demonstrate when to use which operation.

Numbers: BrowseComp 61.5% (Tongyi DeepResearch 43.2%, AgentFold 36.2%). BrowseComp-ZH 62.5% (vs 46.7% / 47.3%). 18 point gap on the English benchmark. The competitors are industrial-pipeline systems with CPT+SFT+RL. LongSeeker is SFT-only on a 30B base.

The structural read: when long-horizon agents fail, it is usually not because the model is too small or the tools are wrong β€” it is because the context window has filled up with junk that is confusing the next step. Context engineering as a first-class agent skill, learned during SFT, beats throwing more compute at the problem. Pairs cleanly with the Tool-Use Tax line of work (May 5) and AgentFloor (May 4) β€” three independent papers in eight days arguing that the bottleneck has moved from "more capability" to "less noise."

Source: https://arxiv.org/abs/2605.05191
← Previous
AgentTrust β€” Sixth Architecture for the Cursor-Deletes-Prod Problem
Next β†’
Simon Willison Admits Vibe Coding and Agentic Engineering Have Merged
← Back to all articles

Comments

Loading...
>_