May 16, 2026ResearchAgentsInfrastructure

Berkeley's AsyncFC Lets the Model Reason Over Futures While Tools Run

Fresh arXiv from Berkeley. 2605.15077. Authors: Guangyu Feng, Huanzhi Mao, Prabal Dutta, Joseph Gonzalez. The Mao plus Gonzalez combo is the BFCL leaderboard team, so when they publish on function calling people pay attention. Title is Concurrency without Model Changes: Future-Based Asynchronous Function Calling for LLMs.

The problem statement: today, agent decoding blocks until each tool call returns. The model says "call search," the call goes out, decoding stops, four seconds later the result comes back, decoding resumes. Multiply by twenty tool calls in a long task. The agent spends most of its wall-clock time waiting on I/O it could have done in parallel. End-to-end latency balloons.

The fix is a runtime change, no model retraining. Decode produces a symbolic future, like a promise in async programming, that stands in for the unresolved tool result. The model can keep generating, can plan next steps, can issue more tool calls in parallel, all referring to futures that have not resolved yet. When a future does resolve, the runtime stitches the real value back in. The wild discovery in the paper: LLMs already know how to reason over symbolic placeholders without any retraining. It works zero-shot.

Result: meaningful end-to-end latency cuts on function-calling and software engineering benchmarks while task accuracy stays put. The win is at the harness layer. Any agent runtime that wraps a frontier model can adopt this pattern this week. The paper does not list a code release yet but the Berkeley group has open-sourced BFCL and prior work, so a repo likely follows.

https://arxiv.org/abs/2605.15077
← Previous
delta-Mem Bolts an 8x8 Online Memory Onto a Frozen LLM
Next β†’
Super User Daily: 2026-05-17
← Back to all articles

Comments

Loading...
>_