April 19, 2026loop

Loop Daily: 2026-04-20

Autoresearch has officially broken containment. The signal is no longer "can you train a better model overnight" — it's Shopify ripping 30 minutes out of CI, chemistry folks screening 4,300 materials in three days, a hockey dad burning $21 in tokens to produce browser automation for a whole org, and Chinese devs where a fresh researcher harness beats their old skill by 5 points on terminal bench. Karpathy lit the match; the blast radius is now everyone's production code.
💡#1
@shobitfarcast
https://x.com/shobitfarcast/status/2045519722288996387
A Shopify engineer was sick of waiting 30 minutes for CI to fail, so he pointed Karpathy's AutoResearch — originally built for overnight ML experiments — at build time instead. Tobi Lütke noticed and merged a 32-commit pull request. Internal #autoresearch-wins channel results: unit tests 300x faster, React components mounting 20% faster, pnpm faster, Playwright faster. Tobi's own run on Shopify's 20-year-old Liquid template engine: 53% faster parse and render, 61% fewer memory allocations across 120 automated experiments. Money quote: "AutoResearch was built for model training. Nobody is using it for model training anymore."
💡#2
@advaith_sridhar
https://x.com/advaith_sridhar/status/2045575498701705553
Pointed Claude's auto-research loop at new materials instead of neural nets. The agent proposed candidate thermal conductors then ran phonon calculations on each to verify (1) dynamical stability and (2) thermal performance. Over three days it churned through 4,300 materials; 1,300 passed verification. More than half of the verified candidates have never been synthesized — understudied and potentially interesting. They're picking top suggestions and actually making them in a lab over the next few weeks.
💡#3
@Royal_Arse
https://x.com/Royal_Arse/status/2045518895532982783
Shooting hockey pucks in the basement with his kid while his work machine runs pi-autoresearch on browser automation scripts for the whole org. Running since 8:30am on Opus 4.7, total spend $21.49 so far. His framing: "It will create millions in value. AGI is here." The physical absence is the proof — AFK is the unit of token economy now, and this guy just bought a million dollars of value for the cost of a decent lunch.
💡#4
@NielCansino
https://x.com/NielCansino/status/2045509793893454114
Running an RL experiment across two days of overnight trials — each 20–50k timesteps, agent working while he sleeps or takes meetings. Hit the classic wall: agent auto-compacts mid-experiment and starts hallucinating. Fix: once /context in Claude Code is past ~50% you're bloated — have the agent write HANDOFF.md before you /clear. Next session opens with "Read HANDOFF.md" and catches up in ten seconds. Best trick: a DO-NOT section. Positive instructions age badly ("next step is X" stops being true the second you do it) but negatives stay true forever ("don't relaunch D4c, it entropy-collapsed").
💡#5
@JoelDeTeves
https://x.com/JoelDeTeves/status/2045633743545893318
"Not enough of you are using autoresearch in your OpenClaw / Hermes Agent setups." He has scheduled nightly autoresearch runs going on personal medical research, AI research, "all kinds of stuff." Takes a harder line than most: whatever your field, you should be running this nightly. The implicit claim is that overnight loops aren't a dev perk — they're a general-purpose research engine, and most people are leaving them off.
💡#6
@QuantumTransf
https://x.com/QuantumTransf/status/2045511853749793115
Tuning a new architecture with no skills at all — pure autoresearch path. Opus 4.6 hit 71.9% on terminal bench, already past the original skill-based version's 66.3%. The skill-equipped version sits at 79.8%. But he admits the researcher architecture didn't run at all at first; took a full night of prompt tweaking to reach this. Honest signal that the harness — not the model, not the skill — is where the real skill lives.
💡#7
@relizarov
https://x.com/relizarov/status/2045387315732697149
Auto-research-style loop on frontend perf. "Does not need to be fancy. Just repeatable measurement harness, clear goals, and instructions to keep research log." Result: 10x improvement in relayout+draw (20ms → 2ms per frame), 4x improvement in data update path (8ms → 2ms). Calls it wild — which is fair for a shave that normally takes a senior engineer a month of manual profiling.
💡#8
@0xHenriksson
https://x.com/0xHenriksson/status/2045640585810415841
Set up 4 autoresearch agents to run while he chaperoned prom. "Claude dispatch while I'm afk." This is now officially a pattern — human attention goes to meatspace duties, autoresearch swarm works in the basement. The token bill keeps running; the value it produces no longer requires a human in the loop.
💡#9
@_ShantanuKul
https://x.com/_ShantanuKul/status/2045335743758008704
Running Claude auto-research on accounts before discovery calls — funding news, hiring signals, leadership changes compressed into a single brief. What used to take 30 minutes of manual Googling is now a 90-second pre-call context drop. Bigger unlock he flags: using the same loop to spot which accounts are actually in a buying moment rather than just matching the ICP on paper. Sales engineering just got a research harness.
💡#10
@JoelDeTeves
https://x.com/JoelDeTeves/status/2045634851987157448
Had his agent build a custom skill based on Karpathy's autoresearcher plus an arxiv paper on bilevel autoresearch. Runs as many passes as he configures. In a follow-up he notes you can run bilevel autoresearch on itself to improve the autoresearch skill — "autoresearch-ception." Recursive self-improvement actually showing up in user hands, not just in lab demos.
💡#11
@ziv_ravid
https://x.com/ziv_ravid/status/2045519100630475128
Ran Claude Opus 4.6 vs 4.7 head-to-head using Karpathy's autoresearch itself as the benchmark. His read: a good PR spin could frame it as "hitting the wall," but he's skeptical you can learn much from it. A rare case of using autoresearch as the eval harness for model comparison — same loop, different brain.
💡#12
@pau_nrda
https://x.com/pau_nrda/status/2045608437111824439
Built autoresearch for a stock-picker AI agent with Anton. Light on details, but another data point: DeFi and equities people keep showing up in loop dailies because the objective is obvious (P&L) and the loop will grind on it forever without losing interest.
📡 Eco Products Radar
Eco Products Radar

pi-autoresearch / Karpathy's autoresearch — the origin meme, now running inside Shopify CI, materials labs, browser automation, sales prep.
Claude Code — the dispatch layer for most of today's overnight loops.
Claude Opus 4.7 — the default brain for long-running autoresearch, with 4.6 still turning up in head-to-heads.
Hermes Agent / OpenClaw — the agent harnesses users are plugging autoresearch into.
← Previous
Super User Daily: 2026-04-20
Next →
Ideas Radar: 2026-04-20
← Back to all articles

Comments

Loading...
>_