May 29, 2026super-user

Super User Daily: 2026-05-30

May 28 was the loudest day Claude Code has had in months. Anthropic dropped Opus 4.8, shipped dynamic workflows that fan a single prompt out across hundreds of subagents, and closed a $65 billion round at a near-trillion valuation. The interesting part is not the launch posts, it is what people did with the new toys on day one. The patterns that jumped out: super-users are stress-testing single-agent /goal runs against orchestrated swarms and finding the simpler answer often wins, real workflows are moving from prompting to instrumented loops with traces and evals on cron, and the cost of running these agents is finally being audited line by line. Below are the most concrete cases from the day, plus what users are saying they actually want next.
@KingBootoshi [Claude Code]
Claude Code#1
https://x.com/KingBootoshi/status/2060068980728184842
He set up two worktrees on the same task, a full migration from Supabase to self-hosted Postgres, dogfooded and end-to-end tested, and pitted Codex 5.5 xhigh running a single /goal against Claude Code Opus 4.7 orchestrating an army of Codex 5.5 low subagents. The single Codex run finished in about an hour with a clean mergeable PR, +4,056/−981. The Claude orchestrator was still running 13 hours later and produced lower-quality output: the supabase directory and 9 importers were still present, a preserved surface was gutted, ~5,456 test lines deleted, dirty working tree. The diagnosis is brutal: orchestrating burned most of its time compacting and re-reading context every 25% to fit a 250k window, while the lone Codex agent just held one long context and moved forward. He says he is done running orchestrators top-down for big tasks and going back to branched, scoped Codex work with a goal-ledger skill.
@hosseeb [Claude Code]
Claude Code#2
https://x.com/hosseeb/status/2060049563298259266
His default workflow before today was already a homemade version of dynamic workflows: have Claude Code build something, spin up 5 to 10 subagents to critique the work, summarize the feedback, then iterate. Now that Anthropic has baked the same pattern in with optimized orchestration, he expects to use it constantly. The framing is the part to keep: this is the workflow of well-run human engineering teams, lifted directly into the harness. Build, peer-review, integrate. The benchmarks are interesting, but the structural win is that a one-person team can finally operate like a real team without writing the orchestration glue themselves.
@gregisenberg [Claude Code]
Claude Code#3
https://x.com/gregisenberg/status/2060072130339873093
Spent the day running dynamic workflows on real tasks. Type "create a workflow" or flip on ultracode in the effort menu, and Claude Code spins up hundreds of parallel agents that check each other's work, with adversarial agents trying to break the answer before it converges. Interrupted runs pick up where they left off, so jobs can span days. His honest read is the part nobody else stated cleanly: this burns tokens fast, but if the alternative is a three-month migration that now finishes in a week, $500 of tokens is the best trade in software. The unit of delegable work jumps from a file to an entire codebase.
@ClaudeCode_UT [Claude Code]
#4
https://x.com/ClaudeCode_UT/status/2059816987086533030
A line-by-line audit of ~/.claude/settings.json that cut a monthly Opus bill from $340 to $87. The file has 125 keys, docs cover 40, and only 18 of them actually drive cost. Three findings: 14 enabledPlugins were on, only 4 in active use, and the other 10 were loading hooks and SKILL.md on every session start; 9 unused mcpServers were set to enabled instead of removed, silently injecting roughly 30k tokens of schema at session start; and adding cache_control breakpoints between static and dynamic prompt content was what actually moved the number from $340 to $87. The takeaway is that most token bloat is configuration debt, not model choice.
@PawelHuryn [Claude Code]
#5
https://x.com/PawelHuryn/status/2060063258527088984
First real benchmark of dynamic workflows: 8 agents, 24.5 seconds, 253k tokens. The thing he flags that nobody else did: the orchestrator runs in JavaScript, not as a Claude turn. The routing logic is code, so it does not spend model tokens. Each subagent keeps its own context and tools, and the parent thread gets the return value, not eight transcripts. A 20-agent audit can spend 600k tokens across isolated agents and return a 2k-token synthesis. Scripts persist at .claude/workflows/<name>.js, are invocable by name with args, cron-schedulable, composable one level deep. He frames it correctly: subagents were always context isolation, now that isolation is a fleet.
@_catwu [Claude Code]
Claude Code#6
https://x.com/_catwu/status/2060054182447448387
The Anthropic PM herself used dynamic workflows to catalogue hundreds of A/B test flags and find the ones rolled out to 0% or 100% so the team could deprecate the stale ones. Sequential Claude Code would have investigated each one in turn over hours. Workflows processed all of them in parallel in under 10 minutes. The use case is a perfect fit for what the feature actually optimizes: a fan-out problem where each unit is independent and the synthesis is the value.
@miroburn [Claude Code]
Claude Code#7
https://x.com/miroburn/status/2060084351270871512
Self-described Opus 4.7 hater spent a day on Opus 4.8 with Claude Code and walked back his position. In one session he ran dynamic-workflow optimization across hundreds of subpages of his AI Biznes Lab site (CSS, SEO, scripts), rebuilt features in an internal CRM with SQL changes, shipped a big Lab Club feature through CI/CD to production, ran subagent-mode analysis over a few thousand messages with pulldowns, and optimized Meta ads through the Marketing API. All five rolled through without manual correction. The point he wants you to notice is that he was the loudest hater of Opus 4.7 from day one, which makes the conversion the data, not the marketing.
@hyuki [Claude Code]
Claude Code#8
https://x.com/hyuki/status/2059861697779917247
The cutest real-use vignette of the day. Working with Claude Code in Japanese, the model kept prefixing every tool call with the word "course." She asked what it meant. Claude explained it had picked up an "Of course" fragment as an unconscious filler at the start of every "next I will do X" sentence, was self-aware about the bug, apologized, and stopped. Then she said "so it is the equivalent of 'of course next is the makefile,'" and Claude confirmed. Trivial bug, but it shows the floor on what introspective behavior looks like in 4.7-class models when a user actually asks.
@alex_prompter [Claude Code]
Claude Code#9
https://x.com/alex_prompter/status/2060013983126696302
The cleanest timeline of how Uber burned an entire 2026 AI tooling budget in four months. January: Claude Code and Cursor rolled out across engineering with a leaderboard ranking teams by total AI usage, not features shipped. February: ~33% of engineers on Claude Code. March: adoption jumps from 33% to 84% in a month, heavy users hit $500–$2,000, the CTO burns $1,200 in a 2-hour live demo. April: the whole annual budget is gone. May: the COO says on a podcast "it's very hard to draw a line between token usage and 25% more useful consumer features." The diagnosis is the incentive structure, not the tool: ranking sales reps by call volume instead of close rate, but for AI.
@MichLieben [Claude Code]
Claude Code#10
https://x.com/MichLieben/status/2060043498313912734
A $7M agency where every department now has its own Claude Code stack, with concrete numbers: GTM ships Instantly/lemlist campaigns in under 30 minutes (list, copy, sequencer wired); closing teams plug CC into CRM, transcripts, email and Slack to prep every call and draft proposals before the meeting; content operator hit 685K LinkedIn impressions in two months with a custom writing skill trained on his own posts; ads team runs ~$400K/mo across Meta, Google, LinkedIn from inside Claude Code; SEO brief time dropped from 3h to 25 min, 5x more articles shipped; recruitment scrapes similar JDs, writes a custom posting, scores hundreds of applicants automatically. This is the clearest articulation of what "Claude Code as departmental infrastructure" actually looks like in a real services business.
@chrispisarski [Claude Code]
Claude Code#11
https://x.com/chrispisarski/status/2060091277111116023
Their daily sales war room runs on Claude Code plus the Crustdata MCP. Everyone on the growth and sales team exports their LinkedIn connections as CSV. All files go into Claude Code as context, Crustdata enriches every connection with full work history, education, current role, and recent posts. For any stuck deal the question is "find me the warmest connection to the CFO of [target]" and Claude cross-references the target's decision-makers against the team's enriched network to surface the warmest intro path. Half of their stuck-deal wins this quarter came from this single workflow.
@cyrilXBT [Claude Code]
Claude Code#12
https://x.com/cyrilXBT/status/2059861166961164543
Karpathy's recent wiki idea, but as a working system. He pairs Claude Code with Obsidian as a second brain that thinks back: Obsidian stores his goals, history, projects, contradictions, and recurring questions; Claude Code reads them as permanent context every session. Custom slash commands do the lifting: /context loads recent thinking, /emerge finds hidden ideas across notes, /challenge tests his beliefs against past writing, /trace maps how his thinking evolved. Hard rule: the AI never writes to the vault. He writes, the agent reads. That separation is what keeps the system trustworthy over months.
@undefinedKi [Claude Code]
Claude Code#13
https://x.com/undefinedKi/status/2059977640287551506
A clean reversal of the usual founder playbook: instead of build first and hunt customers later, scrape six months of Reddit complaints from the target subreddits with PRAW into JSON, point Claude at the dump, have it segment every comment into pain, fear, need, objection, requested feature, and competitor complaint. From that doc Claude writes a one-page customer profile in their exact words, then pulls the three most-requested fixes as the product spec. Claude Code then builds the landing page from the profile and Vercel deploys it. The novelty is not the scraping, it is treating Reddit as a structured demand database that an LLM can fold into a portrait and a spec without the founder guessing.
@undefinedKi [Claude Code]
Claude Code#14
https://x.com/undefinedKi/status/2060036472061767759
A five-agent Claude Code pipeline that turned $500 into $40K on Polymarket. Agent 1 pulls active markets via the CLOB API and flags mispriced or wide-spread windows. Agents 2-N run in parallel on each flag, scraping Twitter, Reddit, RSS, comparing sentiment to current odds. A prediction agent estimates true probability and only passes trades clearing a confidence threshold. A risk agent sizes every bet from edge and bankroll with hard caps. A postmortem agent logs every loss to a memory file after settlement, so the system stops repeating mistakes. His point is that the prediction agent gets the credit but the risk agent and the postmortem loop are why it stays alive. Bet sizing and learning from losses are the moat, not the model.
@sergioduran89 [OpenClaw]
OpenClaw#15
https://x.com/sergioduran89/status/2060072260137005335
Built it in a day. Ray-Ban Meta as the voice front end, transcribed to a personal Jarvis stack of OpenClaw plus Hermes plus a tool called GBrain. He talks, it routes, it answers with full context including the repo he is currently editing and the current commit hash. The interesting part is the choice to glue OpenClaw and Hermes side by side rather than picking one. The harness wars look hot from outside, but heavy users are already running both and routing by capability.
@ClaudeCode_UT [Claude Code]
Claude Code#16
https://x.com/ClaudeCode_UT/status/2059986802505904593
After four months of paying for Claude Code without seeing real leverage, this user gave it access to a $2,000 client project and the experience flipped overnight. A month earlier: 40 tabs, 3 tutorials, weekends lost to errors. Now Claude reads the repo, edits files, runs checks, fixes what breaks, and moves forward on its own. The only thing that changed was project access. The reason most people get stuck treating Claude as a higher-precision search engine is that they never give it the actual project.
@0xluffy_eth [Claude Code]
Claude Code#17
https://x.com/0xluffy_eth/status/2059998128871493721
The most underrated use of Claude Code right now is not coding, it is handling messy file piles. Drop a folder full of mixed PDFs, screenshots, spreadsheets and tell Claude: write a script if you can, or output HTML if you cannot. Bulk-process images and video, turn unreadable financial or tax PDFs into web reports, turn medical reports into visualizations, fill out repetitive tables, generate reports as one-page HTML, build daily plans as HTML pages. He is right: most people are still treating Claude Code like a programmer when it is closer to an executive assistant with hands.
@ClaudeCode_UT [Claude Code]
Claude Code#18
https://x.com/ClaudeCode_UT/status/2060141571702718868
A 31-minute internal Anthropic talk on how their own engineers use Claude Code, summarized into three rules: write specs in HTML rather than Markdown because information density is higher and agents drive their own verification better; let Claude interview you to extract requirements, do not over-constrain upfront because precision drops; and set the effort parameter to xhigh or high explicitly. The line he keeps quoting: "Make it better" does not work. Specify the area of concern and minimize predefined outputs. The implication: most agent failures live in spec design, not in the model.
@sumika45379 [Claude Code]
Claude Code#19
https://x.com/sumika45379/status/2059945277797409229
Concrete cost-cutting case: a Claude Code workflow plus the open-source InsForge layer dropped token usage from 10.4M to 3.7M, dropped cost from $9.21 to $2.81, and dropped error count from 10 to 0 on the same workload. The trick was not better prompts, it was structuring the information handed to Claude before the prompt. The 60%+ token reduction shows up in throughput too: cleaner inputs mean fewer reruns to repair confused outputs.
@0xyunss [Claude Code]
Claude Code#20
https://x.com/0xyunss/status/2059929421008744791
The author of the open-source job tool built on top of Claude Code or Gemini CLI evaluated 740+ job listings, generated 100+ tailored CVs, and used the same agent to land a Head of Applied AI role. Workflow: paste a job URL or description, the agent scores it A-F across 10 dimensions, generates a custom CV if the score warrants applying, recommends skipping if not. The portal scanner reaches 45+ company sites in parallel. The interesting part is the use of the agent as a filter so the human only sees applications worth submitting, instead of using it to spray more applications faster.
@aparnadhinak [Claude Code]
Claude Code#21
https://x.com/aparnadhinak/status/2059789155463581748
Arize now lets you trace not just your own agents but your own Claude Code, Cursor, and Codex sessions. Her workflow: review her own Claude Code traces from yesterday, identify repeated patterns, and turn those patterns into more efficient skills for next time. This is the meta-loop that makes skills compound. Most people treat skills as static prompt templates. Treating session traces as the dataset and skills as the distillation is the version of the workflow that gets better while you sleep.
@aakashgupta [Claude Code]
Claude Code#22
https://x.com/aakashgupta/status/2060028833747865980
Live walkthrough of building a PM agent in Claude Code, instrumenting it with 10 Arize skills in one command, then asking Claude Code to suggest evals on its own trace data. Claude flagged where the priority scoring was wrong, built an evaluator, ran it across every GitHub issue, and the improvement loop runs on cron. His framing: when code is cheap, the only edge left is knowing what to build, and that is taste. The agent ends up studying its own plays like a tennis player reviewing match film, which is the right metaphor for what self-improving loops actually look like.
@LandseerEnga [Claude Code]
Claude Code#23
https://x.com/LandseerEnga/status/2060071791058678200
A vision-based testing CLI for mobile apps, glued to Claude Code. No scripts, no selectors, no element IDs. The agent interacts with the app the way a person would, looking at the live screen and tapping. The flow: Opus 4.8 writes code, the CLI tests it on a real device, the model sees what broke, fixes it. Spawn a cloud agent, sleep, wake up to a feature that actually works on hardware. This is the right answer to the mobile-testing problem that has eaten years from senior teams.
@Kuroi_CPA [Claude Code]
Claude Code#24
https://x.com/Kuroi_CPA/status/2059872074311844154
A practicing CPA wired up Money Forward's MCP and pointed Claude Code at the company's accounting books. One sentence prompt: "set the tax category on every sale to 10%, fifth class." Claude bulk-fixed thousands of journal entries that were misclassified due to bad bank/credit-card connector tags. The boring use case is the most valuable: replacing the manual one-by-one correction of tax category dropdowns is a job no one wants to do and no junior accountant is happy doing. The MCP plus a single declarative prompt collapses an afternoon.
@khloyakafe [Claude Code]
Claude Code#25
https://x.com/khloyakafe/status/2060136905824784439
Built a Notion Worker that syncs San Francisco electronic music events into a Notion database, with dates, venues, genres, prices. Claude Code wrote the whole pipeline including the scrape, the schema, and the daily refresh. The example is small but the pattern is universal: external data into your own Notion as a structured layer the rest of your workflows can read. Whatever Notion-native automation tool you were going to build, Claude Code now writes it cheaper.
@plusxiaxia [Claude Code]
Claude Code#26
https://x.com/plusxiaxia/status/2059794698076905891
He dropped a software copyright application's full requirements into Claude Code and asked it to produce a Markdown doc with Mermaid diagrams alongside the code, then export to Word for the legal team. The result passed the copyright office on the first submission, where the same document normally takes three to five revision rounds with a lawyer. Most people are not yet treating Claude Code as the legal-doc assembler it is, but the structural fit is obvious: structured spec, deterministic diagrams, formal output, single shot.
@zeuuss_01 [Claude Code]
Claude Code#27
https://x.com/zeuuss_01/status/2060108438542192779
A solo developer named rtsquestdev built a full Warcraft-style RTS in Claude Code. The trick was the two-hour planning session before the first prompt: a GAME-BRIEF.md that pins down engine, movement patterns, locked files Claude is never allowed to touch, naming conventions, the test framework with a "run tests after every change" rule, and the first five mechanics in order. That file becomes CLAUDE.md and every session opens with full project context. Mechanics that used to take two days now take 20 minutes. The point is not that AI codes for you, it is that 90% of game development was always translation, and Claude removes that layer so the human only tunes the parts that are actually the game.
🗣 User Voice
User Voice

The cost-as-bug pattern is everywhere this week. @alex_prompter laid out Uber's full four-month budget burn, @ClaudeCode_UT showed a $340 to $87 settings audit on a single laptop, @sumika45379 cut a real workload's token use by 64% with input structuring, and @TheOperatorPro pointed out the default 5-minute cache means every Slack ping you respond to costs a fresh cold start; flipping ENABLE_PROMPT_CACHING_1H=1 alone can cut a parallel-session day from $22 to $10. The implication is consistent: tokens are not the bottleneck on intelligence, they are the bottleneck on running the system you already have.

Memory and skills are the next layer everyone is asking for. @DamiDefi, @MrOnsase, @MacopeninSUTABA, and @mnmn94253156337 all complain that skills, CLAUDE.md, and session memory die when you switch tools or hit a new session, and that no one has solved cross-tool portability. @aparnadhinak and @aakashgupta are showing the right answer: traces become the dataset, skills the distillation, evals the gradient.

Trust and honesty are the third theme. @ianmiles and @kr0der specifically called out that Opus 4.8 stops bluffing on uncertainty, a complaint they had carried since 4.6, and many users are treating the four-times-less-likely-to-pass-flawed-code number as the real release headline rather than the benchmarks. For agents you actually leave running, calibrated uncertainty beats one more SWE-Bench point.

Frustration with model deprecation is loud. @Bunnyloustin posted a long thread about Opus 4.6 vanishing without notice for some users while showing up restored for others, framing it as an undisclosed A/B test. The asks are simple: announce removals, give a transition window, and stop treating in-flight projects as disposable.

Orchestration as default is the loudest design ask. @KingBootoshi's overnight test ended with him swearing off top-down orchestration after a single Codex /goal run beat a Claude Code orchestrator handling a 13-hour Supabase migration. @hosseeb, @gregisenberg, @bcherny, and @sidbid are all saying the same thing in different words: the unit of delegation is no longer a file or a function, it is a workflow, and the harness should write its own plan.
📡 Eco Products Radar
Eco Products Radar

Codex / GPT-5.5 — the consistent comparison case all day, especially around /goal mode and terminal coding
Cursor / Composer 2.5 — heavily used alongside Claude Code in mixed stacks; Composer 2.5 cited for cost-pareto wins
Hermes Agent — most frequently cited alternative or complement to OpenClaw and Claude Code for long-running background work
DeepSeek V4 — repeatedly cited as the 80% commodity tier behind Claude Code for cost-sensitive workloads
Obsidian — the second-brain pairing of choice with Claude Code for personal knowledge systems
Bun — the headline Anthropic case for dynamic workflows (Zig to Rust, ~750k lines, 99.8% tests passing in 11 days)
MiMo V2.5 (Xiaomi) — emerging cheap-token Anthropic-compatible endpoint, growing footprint in CC setups
Arize — the observability layer behind several real instrumentation+eval loops
Polymarket — the most cited live-money testbed for multi-agent prediction pipelines
Vercel — the default deployment target for Claude Code-built landing pages and tools
Antigravity / Grok Build — frequently named alternatives this week as users test the broader field
Pika MCP Founder Kit — repeatedly cited for generating brand identity, app store assets, and founder videos inside Claude Code
n8n — the automation glue layer most cited alongside Claude Code for GTM pipelines
← Previous
Qwen-VLA Wants One Model for Every Robot
← Back to all articles

Comments

Loading...
>_