April 23, 2026ResearchAgentsRL

DR-Venus Shows a 4B Research Agent Can Punch Up to 30B With 10K Examples

DR-Venus dropped on arXiv April 21 and sat at the top of HuggingFace's daily agent paper list a day later. The headline is the size. A 4 billion parameter deep research agent, trained on about 10,000 open-data examples, that significantly outperforms prior agent models under 9B and approaches the performance of 30B systems on multi-step research tasks.

The recipe has two stages. First, agentic supervised fine-tuning to get the model to reliably use browse/search/read tools. Then reinforcement learning with long-horizon rewards to sharpen the reliability on extended research chains. What's notable isn't any single technique β€” it's the combination plus the data efficiency. Everyone else training agent models has been hoarding tens of millions of proprietary trajectories. DR-Venus does it with ten thousand.

The editorial read is that the dense-research-agent category just got a lot cheaper. Anyone with a 4B model, modest compute, and taste can plausibly reproduce a competent deep research agent in a few weeks. That changes who gets to participate. Until now this tier of agent was gated behind OpenAI Deep Research or a full-stack research lab. DR-Venus is the proof that a research group or seed-stage startup with clean data can reach similar ground.

If the Venus team follows through on the promise to release models, code, and recipes, this becomes one of the most reproducible agent training stories of the quarter. Watch for it on HuggingFace in the next two weeks. The names to beat for this benchmark are now a lot smaller.

https://arxiv.org/abs/2604.19859
← Previous
Agent Context Is the Small Dev Tool Fixing the One Thing Every AI Coder Gets Wrong
← Back to all articles

Comments

Loading...
>_