June 20, 2026deep-dive

Deep Dive: 100X Intelligence Costs 100X Tokens — The Economics of the Agent Loop

If you watched only one thing this week, it was money meeting the meter.

Everyone was running the same math: the token bill. Uber burned its entire annual AI coding budget in four months running agent loops, then capped each engineer at $1,500 a month. Databricks customers unknowingly torched tens of millions in a single month, prompting a dedicated spend-control tool. Meta staff consumed 60.2 trillion tokens in 30 days, an annualized $3.9-6.5 billion, before quietly closing the leaderboard two days after it leaked. Amazon shut down KiroRank, its internal board that had been rewarding people for using more tokens. Even Sam Altman asked out loud: spending keeps climbing, but where are the real productivity gains?

These numbers read like horror stories. But after going through every case this week, my strongest takeaway is this: it's not waste. It's the feeding cost of a newborn species.

For thirty years, software ran on "write once, run infinitely," with marginal cost trending to zero. The agent loop flips that on its head. It's "think once, burn once," spending real money on every single pass. Boris Cherny lays the bill out plainly: a coding loop burns 50K to 200K tokens, and a fleet loop with specialist agents burns 500K to 2M per run. That's not a bug. That's how it works.

Here's the counterintuitive equation that gets more right the longer you sit with it: 100X intelligence costs 100X tokens. Token consumption isn't a stain on the invoice, it's the physical evidence that intelligence got amplified.

Why? Because an agent's "smarts" are bought with thinking time. You used to buy a model with a fixed IQ that answered one question per ask. Now you can let it chew on one problem all night, iterate fifty times, run two hundred experiments, and pick the best answer. Karpathy's AutoResearch is the cleanest example: 198 experiments in 24 hours, zero humans, a 2.3% validation-loss improvement. You write one program.md describing the strategy; the agent edits the code, trains for five minutes, keeps the win and resets the loss, with the literal instruction "NEVER STOP, the human might be asleep." One script, one night, equal to days or weeks of a human researcher. The tokens burned that night are the cash equivalent of those days of human intellect.

So when someone says "I spent a few hundred dollars in tokens letting an agent run overnight," don't assume they're crazy. Reframe it: if that night's output would have taken an engineer three days, a few hundred dollars is the cheapest labor on earth. Tokens aren't a cost, they're leverage.

What excited me most this week is that people are figuring out how to spend smart. The prettiest architecture came from someone running 300 Kimi K2.6 agents in parallel, with a single Opus 4.8 sitting above them. That Opus generates no content. It does one thing: audit the swarm, catch where agents stall, loop on bad outputs, or degrade, then rewrite the failing agents before the next pass. By run four it's already better than run one, and nobody touched a prompt by hand.

Sit with that design. The scarce thing was never the 300 agents executing, it's the 301st, the one watching the rest and deciding what changes. That's management itself, automated. An Anthropic PM calls it "dreaming": while you sleep, an out-of-band process digests the errors hundreds of parallel agents made and updates a global file system, so the swarm wakes up a little smarter every day. Without that shared memory, a swarm has chronic amnesia and restarts from zero daily.

Which leads to a deeper question: when tokens are buyable without limit, what's actually scarce?

The quality of your constraints.

One person took this to the extreme. His autonomous agent ran for 206 days, opened 3,157 pull requests, and edited the CLAUDE.md governing how it thinks more than 200 times. Every edit came from a specific documented failure: a queue overfilled, so he added a hard threshold rule; a state file went stale, so he forced a filesystem check at the start of every session. After 206 days the agent now identifies recurring inefficiencies and submits its own protocol-change PRs. His conclusion cuts to the bone: the agent's intelligence matters less than the quality of its constraints, and constraints only get good through failure.

That lines up with a ten-year engineer who said the costliest Vibe Coding mistake is using an LLM for tasks that don't need one. His golden rule: use the agent as an architect, not an operator. Instead of an agent that browses a site daily and burns tokens every run, have it write a deterministic scraper plus alerts once, then run it locally for free forever. The highest form of saving tokens is knowing when not to spend them.

So the two through-lines of the week are two sides of one coin. One side is daring to spend: letting an agent burn a million tokens to get an answer a specialist team would need days for. The other is spending well: knowing which work deserves a loop and which needs only a deterministic script, and knowing how to write a goal a second model can literally verify. As aakashgupta put it, a vague goal makes the loop retry forever or hallucinate success, and you pay for nothing; only a verifiable goal runs to completion.

And those who can't afford to spend? This week offered a second road: local hardware. A 31-year-old Shenzhen repair tech buys dead RTX 3090s for $80 a card, reballs the die at his bench, and keeps the four cleanest cards (96GB VRAM) running Qwen 3 235B, turning a $400/month cloud bill into zero. Another did the math that a $600 M4 Mac mini beats a $1,200 GPU on local AI and breaks even in three months. TheLouieCo nailed the blind spot: everyone compares Claude to open source on single-shot answers, but nobody compares open weights grinding on one problem locally for days. Running SOTA continuously for days costs hundreds of thousands; open source on your desk costs your power bill.

The rich and the poor each have their play. The rich buy parallelism with money, 300 agents burning at once. The poor trade time for it, one mini PC chewing on one problem for a week. But they're making the same bet: the deciding factor isn't how smart the model is, it's unbounded iteration.

Stitched together, this week's scattered stories trace a clear paradigm shift. We're moving from using AI to raising AI. Using AI is one-shot, instant, zero-marginal-cost. Raising AI is continuous, expensive, and demands that you set constraints, budgets, verifiers, and kill switches. The former is a tool; the latter is an employee, one that never sleeps but also makes mistakes and draws a salary, paid in tokens.

If that holds, the moat is clear. It isn't which model you use, models are commoditizing fast, Opus today, GLM-5.2 tomorrow, some open weight after that. The moat is your program.md, the 200 constraints distilled from 206 days of stepping on rakes, your understanding of how this specific problem should be verified. The model is rented. The constraints are yours.

Any problem with an editable file and a measurable metric can become an automated loop that burns tokens for answers. The real signal this week is that the question is no longer "can AI do this," it's "how many tokens are you willing to pay for the answer."
← Previous
Ideas Radar: June 21, 2026
Next →
Ops Log: June 21, 2026
← Back to all articles

Comments

Loading...
>_