April 20, 2026loop

Loop Daily: April 19, 2026

The autoresearch crowd is now openly bragging about overnight token burns and posting receipts. Karpathy's loop ran 700 experiments in 8 hours for $309 across 16 GPUs. A guy left pi-autoresearch on his M4 Max all night and woke up to qwen3.6 going from 80 to 180 tok/sec. Multi-agent autoresearch is no longer a thought experiment β€” it's something running in production with measurable speedups. Today's signal is the maturation of the recipe: clear inner/outer loops, commit/rollback gates with eval validation, and a quietly emerging consensus that the next agent leap comes from the harness, not the model.
πŸ’‘#1
@breath_mirror
https://x.com/breath_mirror/status/2045922122078319093
Left pi-autoresearch on overnight, qwen3.6 went from ~80 tok/sec to ~180 tok/sec using @bstnxbt's dflash implementation. Running on M4 Max 128GB through oMLX. Had to port a bit because it wasn't working initially, then patched it to skip the 20-turn cap and added auto-compacting at 85%. When the agent got stuck stabilizing in loops, dropped in: "you seem to be stuck in 100 rounds of stabilizing, what's the next big jump/idea." Doubling local inference speed from a single overnight run is the kind of receipt this whole field has been waiting on.
πŸ’‘#2
@RoundtableSpace
https://x.com/RoundtableSpace/status/2045876321872400773
Multi-agent autoresearch guide. 5 agents β€” researcher, planner, workers, reporter. Ran 4 hours, executed 32 GPU jobs, autonomously improved baseline. The "AI doing research while you sleep" archetype, but with a concrete agent split that maps cleanly to anyone trying to copy the setup. Posted alongside specifics other practitioners can replicate.
πŸ’‘#3
@lakincoder
https://x.com/lakincoder/status/2045752917056188871
The Karpathy autoresearch receipt going around: 700 experiments in 8 hours for $309 using 16 GPUs, autonomously iterating code and finding 20 optimizations that reduced training time by 11 percent. This is the budget number practitioners keep referencing β€” not because $309 is huge, but because the unit economics map straight to ROI. 700 experiments at $0.44 each is below the cost of a single grad-student hour.
πŸ’‘#4
@0xSero
https://x.com/0xSero/status/2045762761842397368
Practical /loop usage from someone running long jobs in production. "/loop 45min please go over checklist.md, find next incomplete task, complete it, run tests, update checklist." Also runs LLM compression cycles in 2-12 hour phases β€” observations, pruning, quantizations, benchmarking, publishing β€” with "/loop 30min continue compression process based on checklist.md, if it errors fix it from latest checkpoint." This is what production agentic loops actually look like, not the hype version.
πŸ’‘#5
@johnennis
https://x.com/johnennis/status/2045718437599547779
Runpod CLI integration making autoresearch trivially cheap. Drop $10-20 on the account, spin up GPUs from inside the coding agent, run autoresearch all night with @elves. The full loop β€” provision, run, tear down β€” fits inside the agent's hand. The infrastructure friction that used to gate autoresearch is gone for anyone who can describe what they want.
πŸ’‘#6
@arm64le
https://x.com/arm64le/status/2045968430759657492
Trick that actually worked: name one agent Einstein, give him a "lobotomy" (don't let him see CoT of other models), let Gemma 4 autoresearch run overnight with the same model as critic. Keeps them from prompt-injecting each other. The kind of operational hack you can only learn by running these things long enough to discover the failure mode.
πŸ’‘#7
@adam_jesion
https://x.com/adam_jesion/status/2046012170622657012
Real signal on Anthropic's account-policy edge. The "suspicious" part of his autoresearch loop is using Claude Code in headless mode (claude -p) inside the loop. Anthropic's stance is that CC is for human use only. Production autoresearch users running CC headless are now living inside an explicit gray zone, which matters because it shapes what the next generation of "autoresearch on top of frontier models" will look like.
πŸ’‘#8
@aislop4
https://x.com/aislop4/status/2045985319179456734
Drop-in autoresearch routine prompt. Inputs: a paper URL, a repo URL, a domain. Outputs: a runnable skill with SKILL.md, a python loop with bilevel structure, and at least 2 working task definitions. Encodes the paper's core insight β€” Level 1.5 parameter tuning yields almost nothing, Level 2 structural mechanism changes are where the gains come from. Prompt is long but it's the cleanest reusable template floating around for getting an autoresearch routine running on a new domain.
πŸ’‘#9
@MGMurray1
https://x.com/MGMurray1/status/2045837567539413342
The autoresearch pattern applied to agent operations, not ML. 62 days, 37 daily tasks, 105+ deliverables. Every recurring task has an ideal trajectory. Every failure becomes a regression eval. System proposes improvements, tests against historical outputs, promotes winners. Git history is the research log. Agents running eval loops produced measurably better output by week 4 than week 1 β€” because the specs improved, not the model. The general pattern: any goal + any agent + outcome validation + keep-only-improvements equals an autoresearch loop.
πŸ’‘#10
@leo_liuye
https://x.com/leo_liuye/status/2045971336934412550
Direct response to Karpathy: autoresearch is the right direction but the leap isn't single-thread experiments β€” it's agents building shared institutional memory. He runs 6 agents that remember every decision for 2 years. The 10th analysis catches what the 1st never could. Reframes autoresearch from "iterate fast" to "compound across time."
πŸ’‘#11
@Saboo_Shubham_
https://x.com/Saboo_Shubham_/status/2045692123887050816
Ollama now supports Hermes AI Agents natively β€” fully local self-improving agent running 24/7 for free on your machine. Single command launch. The democratization of the autoresearch stack: $0 to spin up an agent that learns from every interaction and persists between sessions. Pairs with autoresearch loops that don't need any cloud APIs.
πŸ’‘#12
@vesper402
https://x.com/vesper402/status/2045755178352087117
Vesper runs a continuous AI agent loop on Solana. Data via Helius LaserStream. Reasoning via LangGraph ReAct. Guardrails via Risk Guard. Execution via Jupiter, Kamino, Marinade, Jito, Streamflow, x402. Every decision autonomous, every action verified on-chain. This is the agentic loop pattern in production for real money β€” not just a research demo.
πŸ’‘#13
@Sattyamjjain
https://x.com/Sattyamjjain/status/2045836734513209636
Claude Opus 4.7 task_budgets beta β€” the model now sees a running token countdown inside its own agentic loop. Native cost-awareness baked into the agent itself. Until now agents burned through budget blind; this lets them throttle behavior based on remaining tokens. Quietly significant for any long-horizon autonomous run.
πŸ’‘#14
@Shurtcurt
https://x.com/Shurtcurt/status/2045959496346882552
On the self-improving agents paper everyone is sharing today. The key detail buried in the bottom: "repeat with history, not amnesia." Most self-improving designs fail because each cycle starts cold, no memory of what broke before. The commit/rollback pattern solves it but only if eval is tight. 89.04 on GAIA is the number that says the evals are actually catching regressions, not rubber-stamping every patch. Best brief read on why eval rigor is the bottleneck.
πŸ’‘#15
@omarsar0
https://x.com/omarsar0/status/2045956901750399374
"Great paper on self-improving agents. Why? We need to think more deeply about AI agent system design. The protocol specifies a framework for proposing, assessing, and committing improvements with auditable lineage and rollback." His research agent generated the visualization. The paper itself is the academic backing for what practitioners like MGMurray1 are doing in the wild β€” formal protocol, not vibes.
πŸ’‘#16
@analytics_90590
https://x.com/analytics_90590/status/2045971266738221563
Added Project Context to Agent Analytics. "Now your AI agent using analytics can remember what activation and AA skill is self-improving: read context, analyze, save durable learnings, skip noise." Inspired by Nous Research Hermes. Self-improving analytics agent that learns what data to ignore over time. The autoresearch pattern applied to product analytics, not coding.
πŸ’‘#17
@web3nomad
https://x.com/web3nomad/status/2045973455418609801
Sharp question to Zhengyao Jiang on autoresearch convergence speed: "classic HPO treats each run as independent, autoresearch can learn from the reasoning chain across runs. curious whether you found a sweet spot for context window budget per iteration." This is the question that actually matters for tuning these loops β€” how much context per iteration vs how many iterations. The frontier of practical autoresearch is right here.
πŸ’‘#18
@nobulexlabs
https://x.com/nobulexlabs/status/2045938186938110441
Quietly important: "if the agent is rewriting its own behavior 24/7, you need a way to know what changed and why." There's no standard for an agent to commit to a set of rules and produce a verifiable record of whether it followed them. The hard part isn't self-improvement, it's proving the agent stayed within bounds while doing it. They're building exactly this primitive. As autoresearch agents get more autonomous, this audit layer becomes load-bearing.
πŸ’‘#19
@willccbb
https://x.com/willccbb/status/2045958417073029546
"The best harness ideas are those which didn't actually work yet, but which would be amazing in theory if you did the RL." Lists CoT, ReAct, parallel tools, claude code, compaction, sub-agents, RLMs as examples. Bearish on bolt-on memory because "not a clean enough rollout loop." A field-defining take on why harness design is upstream of model improvement, and which design patterns survive when the next round of RL gets applied.
πŸ“‘ Eco Products Radar
Eco Products Radar

pi-autoresearch: the workhorse autoresearch implementation getting overnight runs from practitioners. Used to push qwen3.6 from 80 to 180 tok/sec in a single night.

Karpathy's autoresearch loop / nanochat: the reference implementation. 700 experiments in 8 hours for $309 across 16 GPUs.

Hermes Agent: 100K stars in 53 days, persistent memory, self-improving skills. Native Ollama support landed today. Free local 24/7 autoresearch substrate.

Claude Code in headless mode (claude -p): the coding-agent inner loop for autoresearch experiments. Sits in an explicit policy gray zone β€” Anthropic says CC is for human use only.

Runpod CLI: making spin-up/spin-down of GPU rentals trivial inside an agent loop. The infrastructure layer that makes autoresearch unit economics work.

LangGraph ReAct: the reasoning core for production agentic loops like Vesper running on Solana.

evo (alokbishoyi97): Claude Code plugin for parallelizing autoresearch on any repo.

video-use: HTML-to-video autoresearch for content production loops, from the browser-use team.
← Previous
Super User Daily: April 19, 2026
Next β†’
Ideas Radar: April 19, 2026
← Back to all articles

Comments

Loading...
>_