Loop Daily: 2026-04-28
Sunday's loop signal was scattered across small experiments and meta observations rather than headline launches. The pattern of the week — applying Karpathy's autoresearch loop to non-training problems — kept compounding. Screenshots compared to HTML recreations, RSI applied to its own scaffold, $200-rig setups beating $180k workstations on autoresearch tasks, and one user's 3.5-hour unattended Codex run for a single perf task. The pessimistic note from the field is that loops can collapse — a UI told to "feel 5x more polished" might wake up as a blank screen.
#1
@metedata
https://x.com/metedata/status/2048803374586315193
Recreating an existing native app's design in HTML kept failing. Built a Codex auto-research loop that screenshots the app, screenshots the HTML recreation, scores the difference, iterates until 99% match. Early results encouraging. If it stabilizes, going to spin out as a shareable skill on GitHub. The skill of "make this look exactly like that" reduced to a measurable loss function and an overnight loop.
https://x.com/metedata/status/2048803374586315193
Recreating an existing native app's design in HTML kept failing. Built a Codex auto-research loop that screenshots the app, screenshots the HTML recreation, scores the difference, iterates until 99% match. Early results encouraging. If it stabilizes, going to spin out as a shareable skill on GitHub. The skill of "make this look exactly like that" reduced to a measurable loss function and an overnight loop.
#2
@wayne_effect
https://x.com/wayne_effect/status/2048750922667352245
HP Omen 45L, 128GB RAM, 1k+ TOPS. Self-correcting agentic AI with autoresearch for research and software engineering. Output: PhD-level papers in 4-6 weeks each, polished MVPs in 2-4 weeks. Estimated savings vs hiring: $75k-$120k per artifact. The entire setup runs at home. The autoresearch loop is no longer a cloud-only pattern.
https://x.com/wayne_effect/status/2048750922667352245
HP Omen 45L, 128GB RAM, 1k+ TOPS. Self-correcting agentic AI with autoresearch for research and software engineering. Output: PhD-level papers in 4-6 weeks each, polished MVPs in 2-4 weeks. Estimated savings vs hiring: $75k-$120k per artifact. The entire setup runs at home. The autoresearch loop is no longer a cloud-only pattern.
#3
@NicolasZu
https://x.com/NicolasZu/status/2048785329469915639
Kept Codex (with GPT-5.5) running for 3.5 hours last night on a single perf autoresearch task. The longest unattended autoresearch run he's reported. The loop didn't drift, didn't crash. The duration record is migrating from minutes to hours, which is exactly the surface area where new failure modes show up.
https://x.com/NicolasZu/status/2048785329469915639
Kept Codex (with GPT-5.5) running for 3.5 hours last night on a single perf autoresearch task. The longest unattended autoresearch run he's reported. The loop didn't drift, didn't crash. The duration record is migrating from minutes to hours, which is exactly the surface area where new failure modes show up.
#4
@NathanWilbanks_
https://x.com/NathanWilbanks_/status/2048396392700236126
Has an autoresearch agent tracking shitcoins and stocks for him. Auto-generates reports, doesn't trade — he handles execution. Says he's been up consistently. The structural choice worth noting — separate the research loop from the action loop, let the agent do the hard surveillance work and keep the human in the trade decision.
https://x.com/NathanWilbanks_/status/2048396392700236126
Has an autoresearch agent tracking shitcoins and stocks for him. Auto-generates reports, doesn't trade — he handles execution. Says he's been up consistently. The structural choice worth noting — separate the research loop from the action loop, let the agent do the hard surveillance work and keep the human in the trade decision.
#5
@ariccio
https://x.com/ariccio/status/2048608882465861877
The special sauce most developers are just now figuring out: to fully benefit from coding agents you have to fully close the agentic loop. For him that means scripted end-to-end testing of the entire stack, real backend running locally, UI driven as if a user. Plus log following, hang/crash watchdogs, and signals neither Claude Code nor Codex proactively check. Result is a system that works almost fully autonomously because feedback is instant and specific.
https://x.com/ariccio/status/2048608882465861877
The special sauce most developers are just now figuring out: to fully benefit from coding agents you have to fully close the agentic loop. For him that means scripted end-to-end testing of the entire stack, real backend running locally, UI driven as if a user. Plus log following, hang/crash watchdogs, and signals neither Claude Code nor Codex proactively check. Result is a system that works almost fully autonomously because feedback is instant and specific.
#6
@gao_jude
https://x.com/gao_jude/status/2048394028719227191
The simplest agentic loop for any investigation, in 7 steps: deploy your code, generate preview from deploy, create E2E tests with instrumentation, run E2E tests against preview, get insights from instrumentation, either improve from insights or complete the investigation. The pattern works because every step produces a verifiable signal the next step can act on.
https://x.com/gao_jude/status/2048394028719227191
The simplest agentic loop for any investigation, in 7 steps: deploy your code, generate preview from deploy, create E2E tests with instrumentation, run E2E tests against preview, get insights from instrumentation, either improve from insights or complete the investigation. The pattern works because every step produces a verifiable signal the next step can act on.
#7
@ViewFTcom
https://x.com/ViewFTcom/status/2048532421780386184
Ran 50 automation workflows through GPT-5.5 vs Claude 3.5. Tool-use completion jumped from 73% to 91%. The headline isn't the model, it's that the agentic loop "actually closes now instead of hallucinating its way out." When tool calls reliably execute, the entire loop shifts from "guess and recover" to "do and verify."
https://x.com/ViewFTcom/status/2048532421780386184
Ran 50 automation workflows through GPT-5.5 vs Claude 3.5. Tool-use completion jumped from 73% to 91%. The headline isn't the model, it's that the agentic loop "actually closes now instead of hallucinating its way out." When tool calls reliably execute, the entire loop shifts from "guess and recover" to "do and verify."
#8
@ckartik_
https://x.com/ckartik_/status/2048298820350439741
Maxes out at six complex tasks in parallel — limited by his own cognitive capacity to understand outputs, not the agents'. Notes that for broad research or strategic prompts that run for hours, he could realistically have a hundred agents going in a day, because he doesn't have to context-switch when individual tasks are long. The bottleneck for high-task-count parallelism is run duration, not agent count.
https://x.com/ckartik_/status/2048298820350439741
Maxes out at six complex tasks in parallel — limited by his own cognitive capacity to understand outputs, not the agents'. Notes that for broad research or strategic prompts that run for hours, he could realistically have a hundred agents going in a day, because he doesn't have to context-switch when individual tasks are long. The bottleneck for high-task-count parallelism is run duration, not agent count.
#9
@joshwhiton
https://x.com/joshwhiton/status/2048846197511598481
"Pretend you're an agent and auto-research stuff on your own and after a few loops you'll often find you've improved things or solved the problem." A meta observation — the loop pattern works on humans too. The structure of "try, evaluate, keep or revert" is portable to your own work, not just to the agent.
https://x.com/joshwhiton/status/2048846197511598481
"Pretend you're an agent and auto-research stuff on your own and after a few loops you'll often find you've improved things or solved the problem." A meta observation — the loop pattern works on humans too. The structure of "try, evaluate, keep or revert" is portable to your own work, not just to the agent.
#10
@tha_vivid_one
https://x.com/tha_vivid_one/status/2048826045823160673
A real concern about loop optimization gone wrong. If he told Claude "run in a loop until the UI feels 5x more polished" with auto-research enabled, he'd wake up to the UI being a single button. Maybe a blank screen. The loss function eats the problem. The case for keeping humans in the metric definition, not just the metric measurement.
https://x.com/tha_vivid_one/status/2048826045823160673
A real concern about loop optimization gone wrong. If he told Claude "run in a loop until the UI feels 5x more polished" with auto-research enabled, he'd wake up to the UI being a single button. Maybe a blank screen. The loss function eats the problem. The case for keeping humans in the metric definition, not just the metric measurement.
#11
@deepwhitman
https://x.com/deepwhitman/status/2048560544605766128
Running an RSI auto-research loop on the agent-order package — using the package to come up with a plan to make the package better. Recursive self-improvement at the tooling level, not the model level. "So meta."
https://x.com/deepwhitman/status/2048560544605766128
Running an RSI auto-research loop on the agent-order package — using the package to come up with a plan to make the package better. Recursive self-improvement at the tooling level, not the model level. "So meta."
#12
@alokbishoyi97
https://x.com/alokbishoyi97/status/2048365295010816259
Sub-agents in Claude Code can now fork context. KV cache hits dramatically improve. Auto-research agents that sub-call become cheaper and faster. Some sub-agent runs see up to 90% cost reduction. The sub-agent boundary isn't just about isolation — it's a cache optimization layer.
https://x.com/alokbishoyi97/status/2048365295010816259
Sub-agents in Claude Code can now fork context. KV cache hits dramatically improve. Auto-research agents that sub-call become cheaper and faster. Some sub-agent runs see up to 90% cost reduction. The sub-agent boundary isn't just about isolation — it's a cache optimization layer.
#13
@AnirudhDabas
https://x.com/AnirudhDabas/status/2048829076840775849
Wrote about why most Shopify stores are invisible to AI agents and what he's doing about it. Covers shelf, the autoresearch loop, backpressure checks, reward hacking, and why building the eval was harder than building the loop. The eval-vs-loop distinction is the unsolved layer for most autoresearch attempts in production.
https://x.com/AnirudhDabas/status/2048829076840775849
Wrote about why most Shopify stores are invisible to AI agents and what he's doing about it. Covers shelf, the autoresearch loop, backpressure checks, reward hacking, and why building the eval was harder than building the loop. The eval-vs-loop distinction is the unsolved layer for most autoresearch attempts in production.
#14
@0rdlibrary
https://x.com/0rdlibrary/status/2048863365942915276
Published a Solana auto-research wiki driven by "Clawd" — the agent develops while you dream, then trades, analyzes, and trenches on pump.fun while you sleep. The "auto-research as overnight financial agent" pattern keeps showing up in crypto-native users, and the wiki format is the persistent memory layer the loop needs to compound across nights.
https://x.com/0rdlibrary/status/2048863365942915276
Published a Solana auto-research wiki driven by "Clawd" — the agent develops while you dream, then trades, analyzes, and trenches on pump.fun while you sleep. The "auto-research as overnight financial agent" pattern keeps showing up in crypto-native users, and the wiki format is the persistent memory layer the loop needs to compound across nights.
📡 Eco Products Radar
Eco Products Radar
- Karpathy autoresearch — the originator pattern, now applied to UI matching, kernel optimization, marketing, finance, and recursive self-improvement
- Codex — paired with autoresearch for long-running unattended jobs (NicolasZu's 3.5h record, metedata's screenshot diff loop)
- Pi (PhoneClaw) — the reference harness implementation users keep migrating their loops to
- Karpathy autoresearch — the originator pattern, now applied to UI matching, kernel optimization, marketing, finance, and recursive self-improvement
- Codex — paired with autoresearch for long-running unattended jobs (NicolasZu's 3.5h record, metedata's screenshot diff loop)
- Pi (PhoneClaw) — the reference harness implementation users keep migrating their loops to
Comments