April 28, 2026loop

Loop Daily: April 29, 2026

Autoresearch is no longer a thought experiment — today's signal is that the recipe is leaking out into normal-developer territory. @RhysSullivan's 'first autoresearch ever' was a one-shot 'minimize LOC in this codebase' run. @the_other_max ran one overnight on a CI pipeline and cut build time from 12.5m to 7m. @NicolasZu published a public service announcement urging anyone with leftover weekly Codex tokens to launch overnight autoresearches across game performance, balance, codebase quality, UI polish, and marketing — with concrete loop targets for each. @cyrusnewday inherited Karpathy's framing and shipped gepa-research, swapping greedy hill-climbing for Pareto-frontier exploration via @gepa_ai. Around all of this, the orchestrator wars heat up: evo, gepa-research, pi-mono/agent, and FutureAGI all argue different harness choices at the same moment that Hermes Agent ships cleaner mid-loop interruption primitives.
💡#1
@NicolasZu
https://x.com/NicolasZu/status/2048706563343310862
Public service announcement for Codex weekly token holders: don't let surplus tokens go to waste, run overnight autoresearches. Specific loop targets he's running tonight: game performance (loop until perf increases), game balancing (loop until game is balanced), codebase quality (loop until 0 functions below CRAP-30), app design (loop until UI feels 5x more polished), and marketing (loop until 50+ short-form video hooks with example links). The post hit 643 likes and 69K impressions because the recipe is concrete: each loop has a measurable success criterion, not vague 'improve the thing.' This is the autoresearch playbook leaving the Karpathy thread and entering normal-developer practice.
💡#2
@the_other_max
https://x.com/the_other_max/status/2048441071600857123
Ran his own autoresearch on a CI pipeline overnight, cut build time from 12.5 minutes to 7 minutes. Now mapping where else in the codebase makes sense for the same treatment. The interesting thing isn't the 44% improvement — it's that 'put autoresearch on a real production pipeline overnight' is now a one-tweet receipt instead of a paper. The implicit eval (build still passes after the optimization) is doing the heavy lifting.
💡#3
@RhysSullivan
https://x.com/RhysSullivan/status/2048609240647147635
Did his first autoresearch ever today. Prompt: minimize the number of lines of code in this codebase, with extra detail. Calls it a potential anti-slop measure and is already eyeing 'minimize the possible states' as a follow-up axis. 38K impressions on the receipt because LOC reduction is the canonical 'every dev knows their codebase has too much' wish. The honest replies asked the right question: was the resulting code good and how was existing functionality verified? — exposing that the eval gate is the missing piece for everyone copying this.
💡#4
@cyrusnewday
https://x.com/cyrusnewday/status/2048903089022013545
Shipped gepa-research, an open-source coding agent plugin inspired by Karpathy's autoresearch but with a different inner search algorithm: instead of greedy hill-climbing or tree-search, it explores the Pareto frontier using @gepa_ai. 321 likes, 115K impressions in a few hours. Important pattern in the Loop space — instead of cloning Karpathy's recipe, builders are swapping out the search component while keeping the outer harness. This is now a competitive design space (greedy vs evolutionary vs Pareto vs MCTS), not a single canonical loop.
💡#5
@AnirudhDabas
https://x.com/AnirudhDabas/status/2048829076840775849
Wrote up a real production autoresearch case for Shopify-store agent visibility: the inner loop, backpressure checks to keep the agent from running away, reward hacking detection, and — the key admission — that building the eval was harder than building the loop itself. Most autoresearch demos hide this; he names it directly. Naming the eval as the harder problem is becoming a louder consensus this week (@omarsar0 makes the same point about FutureAGI below).
💡#6
@alokbishoyi97
https://x.com/alokbishoyi97/status/2048365285892125023
Released evo v0.3 — autoresearch orchestrator that runs as a plugin on top of Claude Code, Codex, OpenClaw, or Hermes Agent. New in v0.3: RLMs, forked context agents, the inner loop polished further. Spent the rest of the day actively recruiting testers from autoresearch-pilled threads, including offering charity donations in exchange for feedback calls. The pattern: orchestrator authors are racing to lock in users while the recipe is still small enough to switch.
💡#7
@metedata
https://x.com/metedata/status/2048803374586315193
Solved a stubborn 'recreate this native app's design in HTML' problem with a Codex auto-research loop: take screenshot of native app, take screenshot of HTML recreation, score the difference, iterate until 99%. Early results encouraging, plans to ship the workflow as a Claude Code skill on GitHub. Concrete instance of the visual-eval-loop pattern — using a screenshot diff as the metric — that solves a class of problems where the loss function isn't a number until you write it as one.
💡#8
@Teknium
https://x.com/Teknium/status/2048232396924088713
Hermes Agent tip explaining the 4 ways to interact with a running agent loop: (1) regular message interrupts the loop and forces a response, (2) /queue queues for after the current loop completes, (3) /bg or /btw runs a parallel async prompt, (4) /steer injects guidance into the next tool result mid-loop without stopping it. 1.9K likes, 75K impressions. The /steer primitive is the design innovation — it's how you nudge a long-running autoresearch off a bad trajectory without killing the run and losing context. Claude Code is still missing this.
💡#9
@AScully789
https://x.com/AScully789/status/2048878557749760039
Recipe for getting off paid coding-agent subscriptions: rent a small affordable VPS, load a large open-source model on it, then use autoresearch to optimize the model's output speed for that specific small VPS. Use the optimized model from your laptop infinitely without paying API costs. Two-stage autoresearch — first the loop optimizes the substrate, then you use the substrate as your runtime. Same pattern @RoundtableSpace's free Claude Code proxy hits from a different angle.
💡#10
@omarsar0
https://x.com/omarsar0/status/2048759865007591615
Hard rule for everyone building self-improving agents: don't bother without evals. An agent can't improve from traces it can't evaluate. Holds up @FutureAGI_ as the example to copy because they shipped a fully open-source eval platform combining hallucination/groundedness/PII/toxicity/tool-use correctness checks, six prompt optimization algorithms (GEPA, PromptWizard, ProTeGi, etc), multi-turn voice simulation through LiveKit/VAPI/Retell/Pipecat, and OpenTelemetry-native tracing across 50+ frameworks. The argument: self-improving agent infra you can't trust is worse than no agent infra.
💡#11
@imbue_ai
https://x.com/imbue_ai/status/2049174423757103217
Most agents rush to code or guess at a plan; their new Blueprint inverts the order — it reads your codebase first, asks grounded questions that actually matter, then hands any agent a plan worth executing. The pattern is the planner-as-pre-loop: instead of letting the agent loop discover the plan through trial and error (which burns tokens and leaves slop), front-load the planning into a deliberate Q&A pass that an autoresearch loop can then consume cleanly.
💡#12
@DivyanshT91162
https://x.com/DivyanshT91162/status/2048260029799711200
Pi mono/agent (the pi-autoresearch family) reads as the cleanest agent loop on the internet right now: just a few files, no bloated frameworks, no over-engineered abstractions. Highest cache hit rate, lowest tokens per session, minimal bugs, perfect for both learning and production. The post lands because Pi is the inverse argument to the orchestrator wars happening above — sometimes the right harness is the smallest one that closes the loop. Karpathy's autoresearch reference implementation lineage.
💡#13
@mstockton
https://x.com/mstockton/status/2048820005706334324
Hands-on observation on a new pattern he's evaluating: push the complexity of provider-specific tool invocation into a sub-agent so the main agent loop's context stays clean. Most useful when you have multiple providers with overlapping tools and no control over what context those tools inject. Calls out that nothing here is figured out — eval helps, but a lot of agent harness design is still more art than science.
💡#14
@erans
https://x.com/erans/status/2048831110696796619
Built a tiny harness that runs every Claude agent turn through Anthropic's Batch API. Result: terrible for one agent (90-120s per turn kills interactivity) but potentially great for a fleet. Insight: Batch is not an agent loop pattern, it's a fleet optimization layer. The right place for it is the layer where you're running many slow agents in parallel rather than one fast interactive one. Posted writeup with code.
💡#15
@ariccio
https://x.com/ariccio/status/2048608882465861877
The thing most devs are finally figuring out: to get the full benefit of agentic tools you have to fully close the agentic loop. In his case that meant standing up fully-scripted end-to-end testing of the entire app stack with the real backend running locally and the UI being driven as a user. Combined with log following, hang/crash watchdogs, and other signals that neither Claude Code nor Codex/GPT models proactively check, you end up with something that operates almost fully autonomously because feedback is instant and direct.
💡#16
@yossry_i
https://x.com/yossry_i/status/2049121408278462692
Pushback on the skills.md craze: agentic coding is not writing skills.md by hand. The core of agent-oriented learning is learning from experience interacting with an environment to achieve a goal. You can bootstrap your agent, but skills.md and goals.md should be updated by the agent itself. The shift is from manual prompt engineering to the agent maintaining its own memory artifacts as a side effect of running the loop.
💡#17
@AymericRoucher
https://x.com/AymericRoucher/status/2049151467555373146
Poolside dropped its first public models on Hugging Face: 225B-23B-active and 33B-3A coder models with hybrid attention (3:1 global vs sliding window), KV cache quantized to FP8, near-SOTA results on par with Qwen-3.5, Apache 2.0 licensed. Also released pool, their CLI coding agent. Three things in one drop: a frontier-tier open coding model, a competitive coding-agent CLI, and a real open-weights license — Poolside just entered the orchestrator-vs-model conversation as a serious player.
📡 Eco Products Radar
Eco Products Radar

| Tool | Mentions | Note |
| --- | --- | --- |
| autoresearch (Karpathy lineage) | 36+ | The umbrella term; recipe is leaking to mainstream devs |
| evo (alokbishoyi97) | 8+ | Plugin orchestrator over Claude Code/Codex/OpenClaw/Hermes |
| gepa-research / GEPA | 5+ | Pareto-frontier exploration alternative to greedy/tree |
| pi-mono / pi-agent | 5+ | Smallest-possible reference loop, highest cache hit rate |
| Hermes Agent | 7+ | /steer mid-loop guidance + 4 interaction modes |
| Codex | 12+ | Auto-research host of choice for token-rich users |
| Claude Code | 11+ | Often the harness, often hitting limits, often replaced |
| FutureAGI | 3+ | Open-source eval platform for self-improving agents |
| Poolside / pool CLI | 3+ | New open-weight coder + agent CLI just dropped |
| Blueprint (imbue_ai) | 3+ | Planner-as-pre-loop pattern |
← Previous
Super User Daily: April 29, 2026
Next →
Ideas Radar: April 29, 2026
← Back to all articles

Comments

Loading...
>_