April 11, 2026ResearchAgentsSkills

Metis: The Agent That Learned When NOT to Use Tools

Here's a number that should bother every agent builder: current multimodal agents invoke external tools 98% of the time, even when the answer is right there in the image they're looking at. A new paper called "Act Wisely" introduces Metis, a model that drops that to 2% — while actually getting more accurate.

The problem is what the authors call "blind tool invocation." Ask a vision agent what color a car is in a photo, and it'll fire up a web search instead of just looking at the picture. It's the AI equivalent of Googling your own name — technically works, but embarrassingly wasteful.

The fix is HDPO, a training framework that separates accuracy from efficiency into two independent optimization channels. The accuracy channel makes sure the model gets the right answer. The efficiency channel, applied only to already-correct trajectories, teaches the model to skip unnecessary tool calls. You don't sacrifice correctness for speed — you earn speed by being correct first.

The practical implication is massive. Every tool call costs tokens, time, and money. An agent that calls tools 98% of the time when it only needs to 2% of the time is burning 50x more resources than necessary. Metis shows that agents can learn metacognition — knowing when their own internal knowledge is sufficient.

Code: https://github.com/Accio-Lab/Metis
Paper: https://arxiv.org/abs/2604.08545
← Previous
Claude Code Ultraplan: Planning Moves to the Cloud, Terminal Stays Free
Next →
KnowU-Bench: Finally, a Benchmark That Tests If Agents Know When to Shut Up
← Back to all articles

Comments

Loading...
>_