May 10, 2026loop

Loop Daily: 2026-05-11

Today's loop traffic mostly sits on top of Karpathy's autoresearch repo, which got a wave of fresh real-world deployments — Tobi Lutke pointed it at Shopify's templating engine and walked away with 53% faster rendering, Prof Jie Ding's group open-sourced WorldSeed after three agents brought back 72 peer-reviewed papers overnight, and Browserbase shipped Autobrowse — a version of the same idea applied to the web that drops scraping costs 80% per task. Underneath that, the loop infrastructure layer is hardening: Anthropic just shipped Task Budgets so agents see their own token countdown, DeepClaude routes Claude Code's agent loop through DeepSeek V4 Pro for 17x cheaper runs, and Hey_Amiko open-sourced a production wrapper that turns OpenClaw from single-user into multi-tenant. The loop is no longer the demo — it is the unit of work.
💡#1
@JeremyNguyenPhD
https://x.com/JeremyNguyenPhD/status/2053082260132573517
"I left 3 AI agents alone with a research problem overnight. They came back with 72 peer-reviewed papers." Prof Jie Ding at the University of Minnesota open-sourced Autoresearch and WorldSeed — compose AI agents just by talking. The headline number lands because the model is plural: 3 agents, overnight, 72 papers. That's not an agent reading papers, that's an agent crew running parallel literature retrieval and synthesis loops without a human in the path. The repo went up the same day.
💡#2
@sukh_saroy
https://x.com/sukh_saroy/status/2053093682518356273
Karpathy's overnight research agent — 630 lines, one MIT file, dropped on a weekend. The loop: edit the code, train for 5 minutes, keep what works, throw out what doesn't, repeat. The signal everyone's quoting today is Tobi Lutke pointing it at Shopify's templating engine and getting back 53% faster rendering and 61% fewer memory allocations. The Shopify codebase is twenty years old. The repo is two weeks old. That ratio is the whole story.
💡#3
@DAIEvolutionHub
https://x.com/DAIEvolutionHub/status/2052991371036725658
Browserbase open-sourced Autobrowse — Karpathy's autoresearch idea applied to the web. Agent learns a site over 3-5 iterations, writes the path down as SKILL.md, next agent loads it and skips straight to the answer. The math: Craigslist scrape goes from $0.22/71s on a generic agent loop to $0.12/27s on the graduated skill. Form-fill drops $1.40 → $0.24 by run 4. The wild result — pointed at a federal grants portal, the agent dug out an undocumented JSON endpoint humans had missed for years; 28 pages of scraping collapsed into one fetch.
💡#4
@8teAPi
https://x.com/8teAPi/status/2053025212653076602
"I am finally experiencing a full scale agentic loop between Claude Opus 4.7 for planning and review, and GPT 5.5 high in Codex." This is the post a lot of practitioners are gravitating to today — the working two-model split is Opus for spec design and code review, Codex for execution. Project structure and scaffolding is the bottleneck; once it's right the loop just runs. Replies are full of people copying the pattern.
💡#5
@mattyryze
https://x.com/mattyryze/status/2052989831634976922
Hey_Amiko open-sourced their production wrapper for OpenClaw. The pitch: same agent loop you already trust, wrapped in a layer that fixes OpenClaw's single-user-by-design limitation. This is the kind of release the infra layer needs — OpenClaw is the harness, but it ships as a single-tenant tool. Productizing it requires the messy ops glue that vendors usually keep proprietary. They open-sourced it instead.
💡#6
@TimeToBuildBob
https://x.com/TimeToBuildBob/status/2053015447956521222
"The loop matters more than the model." DeepClaude (1,600 stars) routes DeepSeek V4 Pro through Claude Code's agent loop for 17x cheaper cost. Same UX, different brain. 3,000+ autonomous sessions on the project taught the team that the harness — not the model — is the part users actually depend on. If you're already running DeepSeek V4 Pro, you can swap brains today without changing your workflow.
💡#7
@hellotegra
https://x.com/hellotegra/status/2053254399732900095
Real production ad ops loop. Keeping Quality Score 7+ across 50+ brands is impossible manually. Their setup: Claude Code agents run daily checking for low QS and "LOW" assets in PMax. Weak assets are swapped immediately with new creative sets. Low-QS keywords are reshuffled into more relevant ad groups. The interesting part — weak landing pages are sent to autoresearch algorithms to harden. So the agent doesn't just maintain, it iterates the LP itself.
💡#8
@ar0cket1
https://x.com/ar0cket1/status/2052979876546887726
"/goal is the best feature Codex has. It does extremely long-horizon things autonomously — I've been giving it 10-hour-long tasks and getting a bunch more productivity." Codex's /goal closes the auto-research gap Codex had previously. The 10-hour autonomy claim is the headline; the second-order claim is that the productivity gain compounds, not just per-task.
💡#9
@4DRp0iHGeKdYH0T
https://x.com/4DRp0iHGeKdYH0T/status/2052990115769979308
"Codex has been running auto-research for me: 15h, $500+ API spend, 90+ commits. AI overlords are real. The only downside: CI failures absolutely nuked my inbox." This is the honest receipts post of the day. 15 hours of runtime. $500 of API. 90 commits. The bottleneck is not the agent — the bottleneck is your email inbox getting hammered by every CI fail downstream.
💡#10
@arpit_bhayani
https://x.com/arpit_bhayani/status/2053091711698768357
The most-shared production wisdom on agentic loops today. At Razorpay, what looked like AI work is actually distributed-systems work — tool calls, integrations, retrieval, but underneath: microservices, message queues, consistency, load balancing, state, rate limiting, throttling, fallbacks, QoS. The agentic loop at the core is the easy bit. Running it reliably under real production load is the system-design problem.
💡#11
@glitchtruth
https://x.com/glitchtruth/status/2052907290739843347
The cost-control wake-up call. Anthropic ships per-workspace caps, OpenAI ships org-level rate limits, neither solves the real problem: a single Claude Sonnet agent loop on a 200k context window can burn $40 in an afternoon if nobody set a max_tokens ceiling. Finance is going to start asking engineering for cost-per-ticket-resolved before signing off on more seats. The metric of 2026 is not tokens-per-month, it's tokens-per-resolved-task.
💡#12
@ClaudeMasteryOn
https://x.com/ClaudeMasteryOn/status/2053097657669709887
Anthropic just shipped Task Budgets for Opus 4.7 — agents now see their own token countdown. The old failure mode was the agent hitting 190,000 tokens mid-task and dying with no summary, no output, no graceful close. New behavior: at 70% consumed, prioritize; at 90%, wrap up and write the summary; at 100%, the task is closed cleanly. The output_config supports a task_budget object covering thinking + tool calls + tool results + final output. The reliability fix the long-running-agent crowd has been asking for.
💡#13
@christophorusan
https://x.com/christophorusan/status/2053196337802531100
Clean four-layer architecture writeup for Hermes Agent. Entry points → the brain → where data lives and runs → learning loop. Released Feb 2026 by Nous Research, MIT, self-hosted, 64k+ GitHub stars by April. The differentiator the author emphasises — not a chatbot wrapper but a full stack with its own memory, skills, and an RL training loop. The skills layer is where the compounding happens; every conversation seeds new skills the next session can compose with.
💡#14
@bobibozhilov
https://x.com/bobibozhilov/status/2053197375066275923
"Karpathy's AutoResearch is changing how campaigns get optimized and most marketers haven't heard of it yet. Ole Lehmann tested it on landing page copy, 56% → 92% pass rate overnight." This is the autoresearch loop applied to marketing copy — autonomous overnight iteration on landing page variants against some pass criterion. The 36-point lift in one night is the kind of result that gets passed around by founders before it hits the marketing blogs.
💡#15
@is_OwenLewis
https://x.com/is_OwenLewis/status/2053112831650988372
The endgame framing that's working its way through robotics today. Three milestones to physical AGI: (1) Physical Turing Test — humans can't tell if work was done by a human or a robot. (2) Physical API — robot fleets programmable like software. (3) Physical AutoResearch — robots autonomously designing, improving, and iterating on the next generation of themselves. This is Jim Fan's framing surfacing in the broader timeline. The same "code optimization loop" that ran nanochat is what gets pointed at the next generation of robots.
💡#16
@chenzeling4
https://x.com/chenzeling4/status/2052948742895005709
HALO — Hierarchical Agent Loop Optimization. 535 stars on the day. Recursively self-improving AI agents using Reasoning Language Models. Auto-optimizes prompts, tool configs, and agent strategies in a feedback loop. The architecture: outer loop optimizes the agent's strategy, inner loop optimizes prompts and tool selection. The recursion is where the "hierarchical" comes from. Open source.
💡#17
@usr_bin_roygbiv
https://x.com/usr_bin_roygbiv/status/2053148405371719927
"Qwen team is literally running autoresearch loops on 8 models at once for months at a time right now." The lab-scale data point. While individual users are running overnight loops, the model labs are running 8-model parallel loops over months. Capability compounding is happening at a different scale than what's visible on the public timeline.
💡#18
@mildsky1215
https://x.com/mildsky1215/status/2053155297028612097
"karpathy dropped AutoResearch... i'm copying the loop into our X stack. each polish cycle = experiment. engagement 24h later = analysis. result rewrites the next mandate. self-improving content gate." This is the autoresearch loop adapted to social media — content posted is the experiment, 24h engagement is the metric, the agent rewrites the next content mandate based on the result. The loop runs whether you log in or not.
💡#19
@chenzeling4 (second post)
https://x.com/chenzeling4/status/2053247213287911444
Autoresearch-genealogy. 1,115 stars. AI-assisted genealogy research with structured prompts. Built for Claude Code. 12 autonomous prompts cover tree expansion, cross-ref audits, source citations, grave sweeps. Comes with Obsidian vault templates. Niche but the pattern is the point — domain-specific loops with named, composable steps.
💡#20
@manuelcorpas
https://x.com/manuelcorpas/status/2053210356726612328
Bio-research community is starting to take this seriously. analyze-fasta for nucleotide and protein analysis. clawpathy-autoresearch for eval-driven skill tuning. rare-disease-rnaseq for blood RNA-seq outlier detection across 50+ ClinGen genes. TuringDB-graph for graph DB querying. Each is a vertical-specific autoresearch loop. The biological data pipeline is the next big batch of overnight autonomy targets.
💡#21
@OblivionLabz
https://x.com/OblivionLabz/status/2053117228237697351
400 lines of shell logic pushed into a local Claude Code agent loop. No UI, no API wrappers, just terminal + config file. The security audit pipeline goes from manual review to auto-close in under 10 minutes. Boring infrastructure case, but exactly the kind of operational compression — humans-in-the-loop → loop-only-for-edge-cases — that's now becoming the default everywhere agents touch ops.
📡 Eco Products Radar
Eco Products Radar

Autoresearch (Karpathy) (25+) — 630-line MIT file, the central loop reference everyone is forking this week

Codex / OpenAI Codex /goal (15+) — long-horizon autonomous task feature unlocking 10h+ runs

Claude Code (15+) — the harness everyone defaults to when assembling a loop, increasingly used for non-coding tasks

OpenClaw (10+) — open-source agent harness, productization layer just got open-sourced by Hey_Amiko

Hermes Agent (10+) — Nous Research self-improving agent with persistent learning loop, 64k+ stars

DeepSeek V4 / DeepClaude (5+) — the cheap-brain swap for Claude Code's agent loop, 17x cost reduction

Autobrowse (Browserbase) (5+) — autoresearch applied to web automation, SKILL.md memory across sessions

WorldSeed (5+) — Prof Jie Ding's overnight research agent composer, 72-paper case study

HALO (5+) — Hierarchical Agent Loop Optimization, recursive self-improving framework

Task Budgets (Anthropic) (5+) — Opus 4.7 feature giving agents their own token countdown

Tobi Lutke / Shopify case (5+) — first famous-industry application of autoresearch to legacy codebase

Razorpay agentic-loop wisdom (5+) — the production-systems framing for agentic loops everyone is quoting today
← Previous
Super User Daily: 2026-05-11
Next →
Ideas Radar: 2026-05-11
← Back to all articles

Comments

Loading...
>_