May 23, 2026deep-dive

The Real Story of 2026 Isn't Capability. It's the Price of a Loop.

The biggest AI story of the past week wasn't a model. It was an invoice.

Microsoft, the company that owns the data centers Anthropic partly runs on, the company that put thirteen billion dollars into OpenAI, quietly started canceling its own engineers' Claude Code licenses. Not because the tool was bad. Because it was too good, and too good turned out to mean too expensive. Engineers liked Claude Code so much they burned through Microsoft's entire 2026 internal AI budget in roughly four months. Uber told the same story: introduced Claude Code in December, had 84 percent of its engineers classified as agentic coding users by March, blew the full-year budget by April. ServiceNow, same thing. These are not startups fumbling a spreadsheet. These are some of the most financially disciplined companies on earth, and not one of them could predict what an agent would cost once people actually used it.

That is the real headline of 2026, and almost nobody is saying it plainly, so I will. We have solved capability. We have not solved the unit economics of running capability in a loop. And the gap between those two things is about to decide who wins.

Here's the mechanism, said simply. A chatbot is one question, one answer, one API call. An agent is a loop: read, plan, call a tool, read the result, replan, call another tool, check its own work, try again. A single task can be dozens of model calls stacked on top of each other. So when token prices fall, the bill doesn't fall with them. It goes up. Cheaper tokens don't make you spend less, they make you do more looping, because work that was too expensive to automate suddenly isn't. Token prices have dropped something like 280x in two years, and enterprise AI spend went up 320 percent over the same window. That is not a contradiction. That is the loop eating every efficiency gain the moment it appears, and asking for more.

This is why a Claude Code subscriber paying two hundred dollars a month can consume five thousand dollars of real compute. The subsidy is 25x. Every API call you make today is, in a real sense, being paid for by someone else, an investor betting the cost curve bends before the money runs out. OpenAI is projected to lose fourteen billion dollars this year and doesn't expect to be cash-flow positive until 2030. That gap closes eventually. It has to. No business survives charging a dollar for five dollars of goods forever. And when it closes, the question stops being "can an agent do this" and becomes "can you afford to let it."

You could see the whole industry quietly reorganizing around this all week, in the small posts that never make headlines.

The grassroots reaction was a routing rebellion. Builders figured out that Claude Code's real value isn't the Anthropic model underneath, it's the interface, the workflow, the agent loop, the terminal muscle memory. So they kept the shell and swapped the engine. Point Claude Code at DeepSeek's Anthropic-compatible endpoint and you get the entire Claude Code experience running on a model that costs a fraction as much. Others route through their existing ChatGPT or Grok subscriptions to use flat-rate tokens instead of metered API calls. One repo that makes Claude Code free by fanning requests across DeepSeek and Kimi already has tens of thousands of users. Read that carefully, because it's a profound inversion. The model, the thing everyone thought was the moat, has become the swappable, price-shopped commodity. The harness is the product now.

That same week, somebody ran the cleanest experiment I've seen on what this future actually rewards. Three models, Qwen 3.7-Max, Claude Opus 4.7, and GPT-5.5, were each handed a Tetris bot and told to improve it: read your own code, run the benchmark, rewrite yourself, ten times over. A real agentic loop, not a quiz. Qwen won with a 56 percent improvement for one dollar and thirty-two cents. Claude got 28 percent for twelve dollars and fifteen cents. Qwen wasn't just better, it was nine times cheaper. Sit with that, because the metric it surfaces is the one that's about to matter more than any leaderboard: not improvement, but improvement per dollar. In a world where everything runs in a loop, the model that delivers the most progress per token wins production, even if it's not the smartest model in a single shot.

And then there was the quietest, most useful post of the week, from a builder running agents at real scale. He found that token spend doesn't grow with how much you use the agent. It grows with how deep the loop goes. One ambiguous task that triggers three replan cycles costs more than a hundred clean ones. Budgets don't blow up on the demos. They blow up on the long tail of the agent retrying itself. That single observation reframes the entire cost panic. The lever isn't using agents less. It's killing the spirals, the confused-deputy loops, the agent that re-reads the same file forty times because it forgot it already had it.

Which is exactly where the money is moving. Look at what got built and praised this week and you'll notice it's almost all about making the loop cheaper, not smarter. A memory system that researches its own retrieval policy so the agent stops re-fetching what it already knows. Pre-indexed code knowledge graphs that claim 209x fewer tokens per query than reading raw files. Context generators that strip conversational filler to cut token use by half. A skill literally called Caveman that makes the model talk in terse fragments to save 75 percent. Tools that show you which skill or MCP server is eating your budget so you can cut it. None of these make the model smarter. All of them make the loop cheaper. That is the whole game right now.

Step back and the analogy almost writes itself. We're at the point cloud computing hit around 2009. Compute had become abundant and cheap per unit, and everyone assumed that meant bills would shrink. Instead bills exploded, because cheap compute meant you ran far more of it, and a new discipline, cloud cost engineering, FinOps, had to be invented from scratch to stop the bleeding. The same thing is happening now, one layer up. We have abundant intelligence per token. That doesn't make AI cheap. It makes the discipline of loop efficiency the thing that separates the companies that can run always-on agents from the ones that can only afford demos.

So here's the judgment I'll commit to. For the next year, the winners in agents will not be whoever has the smartest model. They'll be whoever makes looping cheap. That means a few specific things, and you can already see each one being built. It means replaying learned decisions instead of re-running the model, so the hundredth time the agent classifies the same email it executes a cached rule in microseconds at zero cost, not fifteen dollars per million tokens for a solved problem. It means persistent memory and state so the loop doesn't pay to rediscover context every session. It means prompt caching and harness engineering treated as a real discipline, which is exactly why DeepSeek just stood up a dedicated "Agent Harness" team and declared Model plus Harness equals Agent. And it means routing, sending the cheap-and-tractable steps to a small model and saving the expensive model for the reasoning-heavy ones.

There's a deeper point under all of this, and it cuts against the easy reading. The lesson is not "tokens are the enemy, use fewer." If 100X intelligence genuinely requires 100X tokens, then for the problems that are actually worth it, the heavy spend is the point. The EvolveMem result this week, seven autonomous research rounds that beat the best published memory baseline by double digits, was expensive, and it was worth every token, because it produced something a human team would have taken days to find. The skill is not spending less. The skill is knowing which loops deserve the deep, expensive iteration and which are just an agent thrashing on a solved problem, and building the infrastructure so the first kind runs and the second kind doesn't.

The first generation of AI products answered "can we do this with AI." We're now deep into the second generation, and almost nobody has admitted what the question actually became. It's not "can we." It's "can we afford to, at a unit economics that survives the subsidy ending." Capability was the last war. The price of a loop is this one. And the companies that internalize that now, while everyone else is still arguing about benchmarks, are the ones that will still be standing when the bill finally comes due.
← Previous
Ideas Radar: 2026-05-24
Next β†’
Ops Log: 2026-05-24
← Back to all articles

Comments

Loading...
>_