May 11, 2026ResearchAgentsRL

AutoTTS lets agents design their own thinking strategy

A 13-author paper from Google plus UMD, UVA, WashU and UNC just dropped on arXiv (2605.08083). The frame is simple to state and hard to do: stop hand-designing test-time scaling strategies and let agents discover them. Instead of a researcher picking Self-Consistency@64 or Best-of-N or Tree-of-Thoughts and shipping it, AutoTTS sets up a search environment where a coding agent iteratively writes and edits a controller program until it finds a strategy that wins.

What came out the other end is the Confidence Momentum Controller (CMC). Trend-based stopping using exponential moving averages of confidence, coupled width-depth control, alignment-aware depth allocation, conservative branch abandonment. 69.5% token savings versus Self-Consistency@64 at the right setting, with strategies that generalize across held-out benchmarks and model scales. And here is the kicker: total cost to discover this controller was $39.9 and 160 minutes wall-clock. Zero LLM calls during the search itself — replay-based evaluation reuses cached trajectories.

The deeper claim is the one that matters. AutoTTS is the proof-of-concept that the meta-layer of agent design — how an agent decides to think — is itself agentic territory. The whole pre-2026 playbook of researchers hand-tuning inference loops is being replaced by agents tuning their own loops with code edits and replay. Pair this with HyperEyes from yesterday (RL on tool-call efficiency) and Tool-Use Tax (May 5) and the cluster is clear: efficiency is the new accuracy.

Code is at github.com/zhengkid/AutoTTS — Python 3.12 for eval, Claude Agent SDK plus OpenRouter API for full discovery reproduction.

Paper: https://arxiv.org/abs/2605.08083
Repo: https://github.com/zhengkid/AutoTTS
← Previous
CloakBrowser patches the wall web agents keep hitting
Next →
React Doctor scans the bad React your agent just shipped
← Back to all articles

Comments

Loading...
>_