OpenSeeker-v2: A 30B SFT-Only Search Agent Beats the Industrial Pipeline
OpenSeeker-v2 dropped on arXiv this week and shot to #1 on HuggingFace Daily Papers with 622 upvotes. Shanghai Jiao Tong University team — Yuwen Du, Rui Ye, Shuo Tang and others — pure academic, no industry budget. They trained a 30B-parameter search agent with simple SFT only. No continual pre-training. No reinforcement learning. Just supervised fine-tuning on 10.6k high-quality trajectories. And it beats Tongyi DeepResearch and RedSearcher, both of which use the full CPT + SFT + RL industrial pipeline.
The numbers. 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, 78.0% on xbench. SOTA among 30B-class ReAct-based search agents. Beats Tongyi DeepResearch (Alibaba) by 2.6% on BrowseComp and 11.4% on BrowseComp-ZH. Beats Claude-4.5-Sonnet, DeepSeek-V3.1-671B, GLM-4.6-357B, MiniMax-M2-230B even though those are far larger. The first SOTA search agent in this scale-and-paradigm built entirely by an academic team using SFT only.
The technique is three small data tweaks. One, scale the knowledge graph during synthesis so each query needs multi-hop evidence. Two, expand the tool set so the agent learns to compose more tools per query. Three, strict low-step filtering — drop any trajectory that finishes in too few tool calls, force the training set to have a difficulty floor. Average tool-call count per trajectory: 64.67. v1 was 46.97. RedSearcher 36.01. The longer trajectories are the actual moat.
Why this matters for the agent thesis. There's a working assumption that frontier agent capability requires frontier compute and proprietary data pipelines. OpenSeeker-v2 is the cleanest data point yet that this might be wrong — that the bottleneck is data quality, not compute, and that academic teams with the right synthesis recipe can keep up. If this generalizes beyond search, the labor cost of training a competitive agent drops by an order of magnitude. Connects to the broader bimodal-agent-infrastructure cluster (Tool-Use Tax, AgentFloor) — small-models-suffice on short-horizon, frontier-or-bust on long-horizon. OpenSeeker-v2 is now the best data point on the small-end side.
Built on Qwen3-30B-A3B-Thinking-2507. 256k context, up to 200 tool calls per trajectory. Model weights to be open-sourced. Paper: arxiv.org/abs/2605.04036. Code: github.com/PolarSeeker/OpenSeeker
← Back to all articles
The numbers. 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, 78.0% on xbench. SOTA among 30B-class ReAct-based search agents. Beats Tongyi DeepResearch (Alibaba) by 2.6% on BrowseComp and 11.4% on BrowseComp-ZH. Beats Claude-4.5-Sonnet, DeepSeek-V3.1-671B, GLM-4.6-357B, MiniMax-M2-230B even though those are far larger. The first SOTA search agent in this scale-and-paradigm built entirely by an academic team using SFT only.
The technique is three small data tweaks. One, scale the knowledge graph during synthesis so each query needs multi-hop evidence. Two, expand the tool set so the agent learns to compose more tools per query. Three, strict low-step filtering — drop any trajectory that finishes in too few tool calls, force the training set to have a difficulty floor. Average tool-call count per trajectory: 64.67. v1 was 46.97. RedSearcher 36.01. The longer trajectories are the actual moat.
Why this matters for the agent thesis. There's a working assumption that frontier agent capability requires frontier compute and proprietary data pipelines. OpenSeeker-v2 is the cleanest data point yet that this might be wrong — that the bottleneck is data quality, not compute, and that academic teams with the right synthesis recipe can keep up. If this generalizes beyond search, the labor cost of training a competitive agent drops by an order of magnitude. Connects to the broader bimodal-agent-infrastructure cluster (Tool-Use Tax, AgentFloor) — small-models-suffice on short-horizon, frontier-or-bust on long-horizon. OpenSeeker-v2 is now the best data point on the small-end side.
Built on Qwen3-30B-A3B-Thinking-2507. 256k context, up to 200 tool calls per trajectory. Model weights to be open-sourced. Paper: arxiv.org/abs/2605.04036. Code: github.com/PolarSeeker/OpenSeeker
Comments