April 22, 2026loop

Loop Daily: 2026-04-23

Quiet day on the autoresearch feed but the cases that did surface are honest. The headline: a $200 stack beat a $180k FARS system on Stanford Agentic Review — and the same author then said human researchers are still safe. Two writers tried to use the loop for paper-grade output and both reported what didn't work. The narrative is shifting from "look what autoresearch did overnight" to "here's where the loop actually lies to you." That's a healthier place to be.
💡#1
@xreviewer555
https://x.com/xreviewer555/status/2046682191938101676
Built an auto-research system for ~$200 that outperforms Analemma's FARS ($180k) on Stanford Agentic Review. Evaluated Claude Code (Opus 4.6), Codex (GPT 5.4), and Kimi Code (K2.5) across 13 CS domains using a four-stage pipeline (Ideation → Experiments → Paper → Review). Claude leads at 5.45, Codex 4.93, Kimi 4.24. CPU > GPU because env complexity hurts. The honest part: Agentic Review overestimates — it rewards "honest negative results" even when the methods are flawed. AI scoring AI. Three failure modes named: minimal experimental setups with overclaiming, fake citations and references that scale with task complexity, and ideas that are reasonable but incremental. Bottom line: agents can already execute the research pipeline, but rigor and faithfulness are still missing.
💡#2
@achenfinance
https://x.com/achenfinance/status/2046612004870164671
Revised an AI-generated paper titled "Hedging the Singularity" with the explicit goal of "human as Clockmaker" — set up the agentic loop, let AI generate something good enough to put his name on. He couldn't get there. Calls it both disappointing and relieving. The follow-up post is the takeaway: questions that can be answered by running an agentic loop will soon be uninteresting, and proving something is actually true will need both humans and AI. The rare case where a credentialed researcher publishes the negative result instead of the highlight reel.
💡#3
@uripomerantz
https://x.com/uripomerantz/status/2046659883332997462
A fintech CEO walks through how his team is moving credit underwriting models from Optuna-style automated hyperparameter tuning to Karpathy-style autoresearch. Optuna already cycles tens of thousands of model variants chasing AUC. The new mode lets the LLM-driven agent change feature definitions, feature counts, and basically anything a smart human researcher could think of — not just hyperparameters. His framing: any problem with an objective function and enough patience is now an autoresearch problem. Conversion rate optimization, ad campaigns, outbound sales — same loop. Worth bookmarking as the most lucid "why this generalizes beyond ML research" essay of the week.
💡#4
@omarsar0
https://x.com/omarsar0/status/2046597807990001981
Observation from a credentialed AI scholar: Karpathy's autoresearch repo started a trend where agents now train AI models to build SoTA agentic systems. His warning is the part to keep: "ultimately, it boils down to good research questions or hypotheses. LLMs are not great at this (yet)." Compute is cheap, the bottleneck is the question. Same insight that shows up in @achenfinance's failed paper attempt.
💡#5
@karimov_elshad
https://x.com/karimov_elshad/status/2046666336194175138
Clean schema thread reframing Claude Code as an agent loop, not a chat app. Five steps: prompt → gather context → take action (edit files, run shell) → verify output → loop or stop. The point is that you can interrupt or steer at any step. Not new ground but useful as a teaching artifact for anyone first wrapping their head around what "the loop" actually means inside a coding agent.
💡#6
@m13v_
https://x.com/m13v_/status/2046642127082516654
Field note on Claude Code's 5-hour rolling rate window: an agentic loop kicked off at 7am can quietly nuke your noon session and you only find out mid-sentence. Tactical detail that anyone running unattended loops needs to internalize — the cost of "set it and forget it" is silent capacity destruction, not just billing surprise.
💡#7
@b04zy
https://x.com/b04zy/status/2046738528248197603
Reader response to the Claude Code Pro-plan removal: "Claude Cowork (like Code) is built on top of the agentic loop so it's strange to keep Cowork but pull Code from pro users." Hidden in the throwaway is a useful frame — Anthropic is signaling that the agent loop is the product and chat is the accessory, but they're inconsistent about which surfaces get which tier.
💡#8
@deepwatrcreatur
https://x.com/deepwatrcreatur/status/2046703889504706956
The 100-1000x productivity gain from agent loops is real but unevenly distributed. His example of cleverness: closing the agentic loop by pointing a webcam at the device under development so the model can see what's actually rendering. The loop closure trick is the kind of small move that most users won't think of, and it's what separates the 1000x outliers from the 10x baseline.
💡#9
@handsomeblob
https://x.com/handsomeblob/status/2046689789060018445
With @aria_agi he's pushing past auto-research into autonomous execution — "one-click shipping flows that take ideas to revenue-ready outputs." Light on detail, but worth flagging because it's the explicit next horizon: not just optimize a metric, but ship the optimized thing into production. The loop is moving up the value chain.
💡#10
@karimov_elshad
https://x.com/karimov_elshad/status/2046666342162690162
Second version of the same author's loop schema, even tighter: prompt → context → action → verify → loop/stop. Useful as the canonical teaching diagram if you're explaining to a non-coder why agent loops are different from chatbots.
💡#11
@leostera
https://x.com/leostera/status/2046612309166973133
Progress update on building a tiny agentic loop with environment tools (class browser, playground, etc.) — needs to be wired to Codex to stop being insanely slow. Mostly a status post but interesting because it confirms the same pattern from the bigger players: tooling speed dominates, not model quality.
💡#12
@UrbanAstroFella
https://x.com/UrbanAstroFella/status/2046723137296085215
Real eval of ChatGPT-image-2's agentic loop: tasked it with generating a dimensionally accurate three-view of a 1920s Savoia Marchetti S.55x seaplane (double-hull flying wing, top-mounted dual engine, very unusual). It used multiple tool calls to find information and iteratively build the image. Got close but the design was unorthodox enough to throw it off. Honest failure mode: the loop's strength is iterative refinement, but training-distribution priors still anchor the output for genuinely novel concepts.
💡#13
@developerpranab
https://x.com/developerpranab/status/2046597295118991432
One-line approach he's using: agentic loop with a simple discover_tools type tool with deterministic matching. Tiny but the right primitive — make tool discovery a first-class loop step instead of a fixed prompt list, and the loop scales to whatever's installed.
📡 Eco Products Radar
Eco Products Radar

Karpathy's autoresearch repo: still the gravity well — referenced as the inspiration in 4+ posts today (xreviewer555, uripomerantz, omarsar0, handsomeblob).

Claude Code: 4+ mentions across the loop conversations — m13v_ on the 5-hour window, b04zy on the Pro-plan removal, karimov_elshad as the canonical agent loop example, xreviewer555 as the Opus 4.6 evaluator.

Codex (GPT 5.4): 2 mentions (xreviewer555 evaluation, leostera wiring) — below threshold but trending.

Kimi Code (K2.5): 1 mention as part of the xreviewer555 head-to-head.

ChatGPT-image-2: 1 mention with a real failure-mode test (UrbanAstroFella).
← Previous
Super User Daily: 2026-04-23
Next →
Ideas Radar: 2026-04-23
← Back to all articles

Comments

Loading...
>_