May 7, 2026AgentsOpen SourceResearch

OpenSearch-VL Open-Sources the Whole Multimodal Search Stack

Tencent Hunyuan dropped the cleanest multimodal search agent recipe to date. Code, data, models, training algorithm — all of it. Three model sizes: 8B, 30B-A3B, 32B. Two datasets: SearchVL-SFT-36k (36,592 multi-turn expert trajectories) and SearchVL-RL-8k (8,000 RL examples).

The technical wedge is Multi-Turn Fatal-Aware GRPO. The standard problem in multi-turn agentic RL is that one bad tool call mid-trajectory poisons the gradient for all the good reasoning that came before it. Most teams solve this by masking tokens after the failure. OpenSearch-VL adds one-sided advantage clamping — when a trajectory ends in failure, you clamp the advantage to be non-negative on the pre-failure tokens. So valid early reasoning still gets reinforced even when the trajectory dies later. Worth +4.2 points over vanilla GRPO on their benchmark.

The data pipeline is also a teach. Wikipedia hyperlink graph sampled with constrained multi-hop paths, fuzzy entity rewriting to prevent single-hop shortcuts, source-anchor visual grounding. Killing the entity rewriting alone costs 10.3 points. Killing the visual grounding costs 11.5 points. The recipe is the moat — and now it is open.

Results: 8B model averages 56.6 across seven knowledge-intensive benchmarks. The 30B-A3B beats Qwen3-VL-30B baseline by +13.8 points and is competitive with Gemini 2.5-Pro on several tasks. This is the third major open multimodal search agent paper this month after OpenSeeker-v2 (May 6) and AgentFold — the small-model-with-good-data thesis keeps stacking.

Source: https://huggingface.co/papers/2605.05185
← Previous
RadixArk Raises $100M Seed — SGLang Goes Commercial
Next →
AgentTrust — Sixth Architecture for the Cursor-Deletes-Prod Problem
← Back to all articles

Comments

Loading...
>_