May 11, 2026Research RL Agents

Xiaohongshu's HyperEyes Cuts Agent Tool Calls by 5.3x. Search Wider, Not Longer.

Xiaohongshu just put out HyperEyes on arXiv. 44 upvotes on HuggingFace Papers today, top of the agent-research feed. The pitch is one sentence — search wider, not longer. The premise is that multimodal search agents waste rounds by hitting one entity at a time, when the right unit of work is parallel search across many entities in a single turn.

The numbers are concrete. HyperEyes-30B beats the strongest comparable open-source agent by 9.9% in accuracy, while using 5.3x fewer tool-call rounds on average. HyperEyes-235B hits 66.6% average accuracy, approaching Gemini-3.1-Pro. They also released IMEB — 300 human-curated instances built specifically to measure search efficiency, not just final accuracy. Code at github.com/Guankai-Li/HyperEyes.

The trick is two-grained RL training. Macro level — TRACE, a tool-use reference-adaptive cost-efficiency reward that monotonically tightens the efficiency target during training. Micro level — on-policy distillation, where a teacher model injects dense token-level corrective signals on the agent's failed rollouts. Plus a unified grounded search primitive that fuses visual grounding and retrieval into a single atomic action — so the agent can issue concurrent search queries instead of serial ones.

Why this is structurally interesting — most agent research papers chase accuracy and treat tool calls as free. HyperEyes makes efficiency a co-equal objective. The agent that hits 60% accuracy in three rounds beats the agent that hits 65% in fifteen rounds, once you account for tokens, latency, and tool budgets. The IMEB benchmark formalizes this, and other groups will now have to defend their tool-call-round numbers, not just their accuracy.

The other interesting line — Xiaohongshu Inc shipping agent research from the consumer-platform side. Not a research lab, not a coding-agent company. The Chinese-internet content platform with 200M MAU is now publishing RL-trained multimodal agents with concrete code. Where the agent research is coming from is moving — not just labs, also the companies whose product surfaces are bleeding tokens. arxiv.org/abs/2605.07177.

← Previous

Warp Just Open-Sourced Its Agentic Dev Environment. OpenAI Is the Founding Sponsor.

SREGym Asks the Boring Question: Can Your Agent Actually Run Production?

← Back to all articles

Xiaohongshu's HyperEyes Cuts Agent Tool Calls by 5.3x. Search Wider, Not Longer.

More Articles

Comments