Loop Daily: April 16, 2026
Autoresearch went from a Karpathy demo to a full-blown engineering discipline this week. The pattern is clear: give an agent a loss function, a compute budget, and permission to iterate. Then walk away. What comes back is often better than what a human would have tried. The most interesting shift is that people stopped asking whether autoresearch works and started arguing about how to harness it. Marketing, quant finance, golf forecasting, nanoGPT training. The loop eats everything.
#1
@shannholmberg
https://x.com/shannholmberg/status/2043983746094026984
A complete adversarial marketing optimization framework built on the autoresearch pattern. Blind judge panels evaluate campaign variants, the loop rewrites copy, retests, and converges. This is the first serious non-ML application of autoresearch that actually shipped metrics. 1015 likes says the marketing world noticed.
https://x.com/shannholmberg/status/2043983746094026984
A complete adversarial marketing optimization framework built on the autoresearch pattern. Blind judge panels evaluate campaign variants, the loop rewrites copy, retests, and converges. This is the first serious non-ML application of autoresearch that actually shipped metrics. 1015 likes says the marketing world noticed.
#2
@akshay_pachaar
https://x.com/akshay_pachaar/status/2044000393110474756
MiniMax M2.7 ran self-evolution at industrial scale. 100+ rounds of autoresearch across 22 ML competitions, pulling 9 gold medals. This is what happens when you give the loop real compute and real benchmarks instead of toy problems. The gap between weekend hobbyists and funded labs just got wider.
https://x.com/akshay_pachaar/status/2044000393110474756
MiniMax M2.7 ran self-evolution at industrial scale. 100+ rounds of autoresearch across 22 ML competitions, pulling 9 gold medals. This is what happens when you give the loop real compute and real benchmarks instead of toy problems. The gap between weekend hobbyists and funded labs just got wider.
#3
@charles_irl
https://x.com/charles_irl/status/2044150322973815023
Modal plus autoresearch. Autoscaling meets auto-improvement. The insight is that autoresearch is compute-hungry in bursts, and Modal's serverless GPU model fits that profile perfectly. Spin up 50 GPUs for an hour, run your loop, shut everything down. No idle machines burning money.
https://x.com/charles_irl/status/2044150322973815023
Modal plus autoresearch. Autoscaling meets auto-improvement. The insight is that autoresearch is compute-hungry in bursts, and Modal's serverless GPU model fits that profile perfectly. Spin up 50 GPUs for an hour, run your loop, shut everything down. No idle machines burning money.
#4
@FletchPh
https://x.com/FletchPh/status/2044048418906018293
OODA decision framework wired into autoresearch. The agent observes results, orients against baselines, decides what to try next, acts, and loops. 92% keep rate on proposed changes and measurably better validation loss. Military decision theory applied to ML optimization. Strange and effective.
https://x.com/FletchPh/status/2044048418906018293
OODA decision framework wired into autoresearch. The agent observes results, orients against baselines, decides what to try next, acts, and loops. 92% keep rate on proposed changes and measurably better validation loss. Military decision theory applied to ML optimization. Strange and effective.
#5
@rot13maxi
https://x.com/rot13maxi/status/2044066414047236603
Built a tiny GPT from scratch, then let autoresearch grind on it. Squeezed out 4% improvement. Not flashy, but this is the right way to learn. You understand the architecture because you built it, then you watch the agent find things you missed. Educational autoresearch.
https://x.com/rot13maxi/status/2044066414047236603
Built a tiny GPT from scratch, then let autoresearch grind on it. Squeezed out 4% improvement. Not flashy, but this is the right way to learn. You understand the architecture because you built it, then you watch the agent find things you missed. Educational autoresearch.
#6
@0x_Discover
https://x.com/0x_Discover/status/2044001503472419098
Chinese student running autoresearch 24/7 on three Mac machines for $4K total cost. Claude plus Obsidian orchestrating 12 microprocesses. This is the scrappy indie setup that keeps showing up. You do not need a cluster. You need patience and a good loss function.
https://x.com/0x_Discover/status/2044001503472419098
Chinese student running autoresearch 24/7 on three Mac machines for $4K total cost. Claude plus Obsidian orchestrating 12 microprocesses. This is the scrappy indie setup that keeps showing up. You do not need a cluster. You need patience and a good loss function.
#7
@AlchainHust
https://x.com/AlchainHust/status/2043878638475718981
Darwin.skill applies autoresearch to 60+ Claude Code skills inside the Nuwa framework, which has 9000+ GitHub stars. 38 commits from a single autoresearch run. The loop is not just optimizing models anymore. It is optimizing the tools that build models.
https://x.com/AlchainHust/status/2043878638475718981
Darwin.skill applies autoresearch to 60+ Claude Code skills inside the Nuwa framework, which has 9000+ GitHub stars. 38 commits from a single autoresearch run. The loop is not just optimizing models anymore. It is optimizing the tools that build models.
#8
@dphuang2
https://x.com/dphuang2/status/2043899731160773067
108 experiments and 100 git commits on Tinker for golf forecasting. Every experiment logged, every change tracked. This is what disciplined autoresearch looks like in practice. The git history alone is a dataset of what works and what does not in iterative ML.
https://x.com/dphuang2/status/2043899731160773067
108 experiments and 100 git commits on Tinker for golf forecasting. Every experiment logged, every change tracked. This is what disciplined autoresearch looks like in practice. The git history alone is a dataset of what works and what does not in iterative ML.
#9
@tinkerapi
https://x.com/tinkerapi/status/2044093067372965897
Tinker built specifically for autoresearch workflows. The golf forecasting use case keeps coming up because it has clean data, fast feedback loops, and a well-defined loss function. Tinker provides the sandbox. The agent provides the ideas. 120 likes suggests real traction.
https://x.com/tinkerapi/status/2044093067372965897
Tinker built specifically for autoresearch workflows. The golf forecasting use case keeps coming up because it has clean data, fast feedback loops, and a well-defined loss function. Tinker provides the sandbox. The agent provides the ideas. 120 likes suggests real traction.
#10
@fjzzq2002
https://x.com/fjzzq2002/status/2044079073144492354
78 experiments over 25 hours for about $600 using Claude Code in an auto-research loop. Beat the baseline by roughly one percentage point. The cost-per-improvement math is starting to make sense for individuals, not just companies. $600 for a genuine research result is cheap.
https://x.com/fjzzq2002/status/2044079073144492354
78 experiments over 25 hours for about $600 using Claude Code in an auto-research loop. Beat the baseline by roughly one percentage point. The cost-per-improvement math is starting to make sense for individuals, not just companies. $600 for a genuine research result is cheap.
#11
@DBuniatyan
https://x.com/DBuniatyan/status/2044162314870632656
Swarm autoresearch on Modal with shared memory across dozens of GPUs. Hit sub-0.975 validation bits-per-byte. This is the distributed version of the pattern. Multiple agents sharing state, splitting the search space, converging faster than any single loop could.
https://x.com/DBuniatyan/status/2044162314870632656
Swarm autoresearch on Modal with shared memory across dozens of GPUs. Hit sub-0.975 validation bits-per-byte. This is the distributed version of the pattern. Multiple agents sharing state, splitting the search space, converging faster than any single loop could.
#12
@Vtrivedy10
https://x.com/Vtrivedy10/status/2044072428993696166
Harness permissions for auto-research. The key finding: explicit enforcement beats prompt-based enforcement every time. You cannot just ask the agent to be careful. You have to build guardrails into the infrastructure. 81 likes from people who learned this the hard way.
https://x.com/Vtrivedy10/status/2044072428993696166
Harness permissions for auto-research. The key finding: explicit enforcement beats prompt-based enforcement every time. You cannot just ask the agent to be careful. You have to build guardrails into the infrastructure. 81 likes from people who learned this the hard way.
#13
@HerselmanI
https://x.com/HerselmanI/status/2043985047712051320
ClerkiQ let an agent loop debug its own prompts. 12x error reduction and 61% revenue increase. Then they turned it off. The bravest part is the off switch. Knowing when the loop has converged and further iteration adds risk, not value. Production autoresearch needs kill conditions.
https://x.com/HerselmanI/status/2043985047712051320
ClerkiQ let an agent loop debug its own prompts. 12x error reduction and 61% revenue increase. Then they turned it off. The bravest part is the off switch. Knowing when the loop has converged and further iteration adds risk, not value. Production autoresearch needs kill conditions.
#14
@himanshustwts
https://x.com/himanshustwts/status/2044035550001410288
Paradigm and Tensorqt podcast on building infrastructure for autonomous research. They call it Flywheel. The thesis is that autoresearch needs purpose-built infra the same way ML training needed purpose-built infra five years ago. History rhymes.
https://x.com/himanshustwts/status/2044035550001410288
Paradigm and Tensorqt podcast on building infrastructure for autonomous research. They call it Flywheel. The thesis is that autoresearch needs purpose-built infra the same way ML training needed purpose-built infra five years ago. History rhymes.
#15
@0xfishylosopher
https://x.com/0xfishylosopher/status/2043848388299587805
SurfAI won Paradigm's autoresearch hackathon. Hackathons are where patterns get stress-tested by people with no legacy code to protect. Winning means the approach survived contact with real constraints and real judges.
https://x.com/0xfishylosopher/status/2043848388299587805
SurfAI won Paradigm's autoresearch hackathon. Hackathons are where patterns get stress-tested by people with no legacy code to protect. Winning means the approach survived contact with real constraints and real judges.
#16
@mathemagic1an
https://x.com/mathemagic1an/status/2044175006134088049
Calls autoresearch-style loops on verifiable problems the most important AI trend right now. Multi-agent systems tackling long-horizon tasks. The emphasis on verifiable is key. Autoresearch works when you can measure. It flails when you cannot.
https://x.com/mathemagic1an/status/2044175006134088049
Calls autoresearch-style loops on verifiable problems the most important AI trend right now. Multi-agent systems tackling long-horizon tasks. The emphasis on verifiable is key. Autoresearch works when you can measure. It flails when you cannot.
#17
@_kevinlu
https://x.com/_kevinlu/status/2044121659142263192
Tinker as a sandbox for giving autoresearch access to RL training infrastructure. The idea is containment. Let the agent experiment freely inside a controlled environment where it cannot break production. Sandboxing is the unsexy prerequisite that makes everything else possible.
https://x.com/_kevinlu/status/2044121659142263192
Tinker as a sandbox for giving autoresearch access to RL training infrastructure. The idea is containment. Let the agent experiment freely inside a controlled environment where it cannot break production. Sandboxing is the unsexy prerequisite that makes everything else possible.
#18
@artemg314
https://x.com/artemg314/status/2044181007016853949
Open-source agentic framework for quantitative finance. Hypothesis testing plus walk-forward validation. Sharpe ratio of 0.86. Quant finance is a natural fit for autoresearch because the feedback signal is unambiguous. Money is the ultimate loss function.
https://x.com/artemg314/status/2044181007016853949
Open-source agentic framework for quantitative finance. Hypothesis testing plus walk-forward validation. Sharpe ratio of 0.86. Quant finance is a natural fit for autoresearch because the feedback signal is unambiguous. Money is the ultimate loss function.
#19
@whichmantech
https://x.com/whichmantech/status/2044123453817794791
Autonomous Claude Code agent loop optimizing another AI agent built with Vercel AI SDK. Agents optimizing agents. We are one layer of recursion away from losing track of what is optimizing what. But it works.
https://x.com/whichmantech/status/2044123453817794791
Autonomous Claude Code agent loop optimizing another AI agent built with Vercel AI SDK. Agents optimizing agents. We are one layer of recursion away from losing track of what is optimizing what. But it works.
#20
@TheValueist
https://x.com/TheValueist/status/2043906140166406613
Running two OpenClaw instances on DigitalOcean. One for production, one for testing and autoresearch. Plus a Hermes agent. The two-instance pattern is smart. Never let the research loop touch the thing that makes money.
https://x.com/TheValueist/status/2043906140166406613
Running two OpenClaw instances on DigitalOcean. One for production, one for testing and autoresearch. Plus a Hermes agent. The two-instance pattern is smart. Never let the research loop touch the thing that makes money.
#21
@grim_nomad
https://x.com/grim_nomad/status/2044130317074891140
Three computers with Claude plus Obsidian running 24/7. Autoresearch with automatic citations while you sleep. The always-on research assistant is becoming a real category. Sleep is the new competitive advantage.
https://x.com/grim_nomad/status/2044130317074891140
Three computers with Claude plus Obsidian running 24/7. Autoresearch with automatic citations while you sleep. The always-on research assistant is becoming a real category. Sleep is the new competitive advantage.
#22
@artificialguybr
https://x.com/artificialguybr/status/2044138849803415714
Forked autoresearch-agents into a three-agent structure. Orchestrator, Researcher, and Implementer. Role separation inside the loop. This mirrors how human research teams work. Specialization within the loop beats a single generalist agent.
https://x.com/artificialguybr/status/2044138849803415714
Forked autoresearch-agents into a three-agent structure. Orchestrator, Researcher, and Implementer. Role separation inside the loop. This mirrors how human research teams work. Specialization within the loop beats a single generalist agent.
#23
@nurijanian
https://x.com/nurijanian/status/2044098643121320236
Rule of Five PM skill built via autoresearch pipeline in Cursor. Project management methodology generated by an optimization loop. The loop is eating non-technical domains now.
https://x.com/nurijanian/status/2044098643121320236
Rule of Five PM skill built via autoresearch pipeline in Cursor. Project management methodology generated by an optimization loop. The loop is eating non-technical domains now.
#24
@realbarnakiss
https://x.com/realbarnakiss/status/2044089227403575438
Scaling from Sonnet to Opus did not improve zk-autoresearch results. Bigger model does not always mean better loop. Sometimes the bottleneck is the problem structure, not the reasoning capacity. An expensive lesson worth sharing.
https://x.com/realbarnakiss/status/2044089227403575438
Scaling from Sonnet to Opus did not improve zk-autoresearch results. Bigger model does not always mean better loop. Sometimes the bottleneck is the problem structure, not the reasoning capacity. An expensive lesson worth sharing.
#25
@AscentBio
https://x.com/AscentBio/status/2044108747392549341
Open-sourcing Faraday, an agentic loop for science. Wet lab meets dry loop. If autoresearch can plan experiments and interpret results, the iteration speed of biology research changes fundamentally.
https://x.com/AscentBio/status/2044108747392549341
Open-sourcing Faraday, an agentic loop for science. Wet lab meets dry loop. If autoresearch can plan experiments and interpret results, the iteration speed of biology research changes fundamentally.
#26
@JulianGoldieSEO
https://x.com/JulianGoldieSEO/status/2043906140166406613
NotebookLM clone built with Claude Code. Auto-research, quiz export, mind maps, infographics. The full knowledge worker toolkit generated by an agent. Not research optimization but research tooling, built by the loop.
https://x.com/JulianGoldieSEO/status/2043906140166406613
NotebookLM clone built with Claude Code. Auto-research, quiz export, mind maps, infographics. The full knowledge worker toolkit generated by an agent. Not research optimization but research tooling, built by the loop.
π‘ Eco Products Radar
Eco Products Radar
#27
Tinker keeps appearing as the default sandbox for autoresearch experimentation, especially in the golf forecasting community. Three separate teams used it this week. If autoresearch needs a gymnasium, Tinker is becoming it.
#28
Modal is the compute layer people reach for when autoresearch needs to scale past a single machine. Serverless GPU bursts fit the pattern perfectly. Two notable projects this week ran distributed autoresearch loops on Modal infrastructure.
#29
Claude Code is the execution engine underneath most of these loops. Whether people use it directly or wrap it in other tooling, it is the default agent runtime for autoresearch. The combination of code execution and reasoning in one agent makes it the natural fit.
#30
Obsidian shows up as the knowledge management layer for always-on autoresearch setups. Multiple indie researchers are running Claude plus Obsidian on dedicated machines around the clock, using Obsidian as both memory and citation store.
#31
Nuwa with 9000+ GitHub stars is the framework where darwin.skill lives. Autoresearch applied to Claude Code skills at scale. The meta-optimization use case, agents improving their own tools, is a strong signal for where the ecosystem is heading.
#32
Flywheel from Paradigm is positioning itself as purpose-built infrastructure for autonomous research. Still early, but the podcast appearance and hackathon suggest they are serious about owning this layer of the stack.
#33
OpenClaw earned a mention for the two-instance production pattern. One for serving, one for research. Simple operational hygiene that more teams should copy.
#34
Hermes runs alongside OpenClaw in at least one production setup. The agent-plus-agent deployment pattern suggests Hermes fills a coordination role that autoresearch loops need but do not provide natively.
Comments