Loop Daily: 2026-04-04
The autoresearch community just got its first real benchmark. Instead of vibes-based takes on whether agent loops beat traditional optimization, we now have side-by-side experiments with numbers attached. Meanwhile, people keep finding new surfaces to point these loops at β cold outbound sales, prediction markets, knowledge base curation, even divergence testing across frontier models. The pattern is the same everywhere: set up a measurable objective, let the loop run overnight, wake up to results that would have taken you a week of manual iteration.
#1
@zhengyaojiang
https://x.com/zhengyaojiang/status/2039742050518634534
Finally someone ran the experiment everyone was arguing about. Autoresearch vs Optuna head-to-head on NanoChat, three runs each. Autoresearch converges faster, costs less per improvement step, and the solutions it finds actually generalize better when you give them more training time. The kicker is that autoresearch searches directly in code space, not just a fixed parameter grid. Even when it stays within the same 16 parameters Optuna uses, the LLM prior picks values that transfer better to longer training runs.
https://x.com/zhengyaojiang/status/2039742050518634534
Finally someone ran the experiment everyone was arguing about. Autoresearch vs Optuna head-to-head on NanoChat, three runs each. Autoresearch converges faster, costs less per improvement step, and the solutions it finds actually generalize better when you give them more training time. The kicker is that autoresearch searches directly in code space, not just a fixed parameter grid. Even when it stays within the same 16 parameters Optuna uses, the LLM prior picks values that transfer better to longer training runs.
#2
@gkisokay
https://x.com/gkisokay/status/2039634985834131505
Built a "Subconscious agent" to solve the biggest complaint about self-improving multi-agent systems: you can't control the outputs. Inspired by Karpathy's autoresearch, it is a background LLM process that continuously hunts for useful problems. It contextualizes data, connects ideas, and stress-tests assumptions all day before anything reaches the main agent. Only tested-good ideas get promoted for further pressure-testing.
https://x.com/gkisokay/status/2039634985834131505
Built a "Subconscious agent" to solve the biggest complaint about self-improving multi-agent systems: you can't control the outputs. Inspired by Karpathy's autoresearch, it is a background LLM process that continuously hunts for useful problems. It contextualizes data, connects ideas, and stress-tests assumptions all day before anything reaches the main agent. Only tested-good ideas get promoted for further pressure-testing.
#3
@yujia_bao
https://x.com/yujia_bao/status/2039742189887226222
Points out the underappreciated bottleneck in auto-research: coding agents struggle to run LLM training jobs at scale. A small infrastructure mistake can cascade badly. Joined thinkymachines to work on tinkerapi, which is specifically built to give auto-research agents reliable compute infrastructure. The emerging stack of tinkerapi plus Cookbook plus Claude Code is starting to look like a real production pipeline for autonomous ML experimentation.
https://x.com/yujia_bao/status/2039742189887226222
Points out the underappreciated bottleneck in auto-research: coding agents struggle to run LLM training jobs at scale. A small infrastructure mistake can cascade badly. Joined thinkymachines to work on tinkerapi, which is specifically built to give auto-research agents reliable compute infrastructure. The emerging stack of tinkerapi plus Cookbook plus Claude Code is starting to look like a real production pipeline for autonomous ML experimentation.
#4
@Zeras_24
https://x.com/Zeras_24/status/2039535272334114951
Took the autoresearch loop and pointed it somewhere nobody expected: divergence testing across frontier models. Forced 320-plus binary questions across ethics and geopolitics on OpenGradient, collected 1,277 TEE-sealed inferences. The finding is that model consensus collapses once you remove hedging language. This is autoresearch applied not to optimization but to systematic probing of model behavior at scale.
https://x.com/Zeras_24/status/2039535272334114951
Took the autoresearch loop and pointed it somewhere nobody expected: divergence testing across frontier models. Forced 320-plus binary questions across ethics and geopolitics on OpenGradient, collected 1,277 TEE-sealed inferences. The finding is that model consensus collapses once you remove hedging language. This is autoresearch applied not to optimization but to systematic probing of model behavior at scale.
#5
@cvssvrt
https://x.com/cvssvrt/status/2039688821810270422
Applied auto-research to cold outbound sales and woke up to 10k new leads scraped on autopilot. The agent continuously searches for new lead sources and creative scraping methods across the internet. Last night it found investor events and pulled their attendance lists. Easy to measure, easy to improve, exactly the kind of tight feedback loop where autoresearch shines outside of ML.
https://x.com/cvssvrt/status/2039688821810270422
Applied auto-research to cold outbound sales and woke up to 10k new leads scraped on autopilot. The agent continuously searches for new lead sources and creative scraping methods across the internet. Last night it found investor events and pulled their attendance lists. Easy to measure, easy to improve, exactly the kind of tight feedback loop where autoresearch shines outside of ML.
#6
@zectrillionaire
https://x.com/zectrillionaire/status/2039774228774547459
Let Claude Code figure out a Polymarket trading strategy entirely on its own. Five hours of autonomous iteration, account went from $294 to $362. No manual strategy design, no hand-holding. The bot built and refined its own approach through the autoresearch loop. People are underestimating how far you can get by just letting the loop run on problems with clear profit-loss signals.
https://x.com/zectrillionaire/status/2039774228774547459
Let Claude Code figure out a Polymarket trading strategy entirely on its own. Five hours of autonomous iteration, account went from $294 to $362. No manual strategy design, no hand-holding. The bot built and refined its own approach through the autoresearch loop. People are underestimating how far you can get by just letting the loop run on problems with clear profit-loss signals.
#7
@0xJsum
https://x.com/0xJsum/status/2039823221038682520
Running self-improving knowledge bases on Obsidian is turning out to be a sleeper use case for autoresearch. Point a long-term agent at any markdown-based platform and let it maintain, connect, and expand domain knowledge autonomously. Claims it is the easiest way to run a long-term autoresearch agent so far, building agents with genuine domain expertise rather than starting from scratch each session.
https://x.com/0xJsum/status/2039823221038682520
Running self-improving knowledge bases on Obsidian is turning out to be a sleeper use case for autoresearch. Point a long-term agent at any markdown-based platform and let it maintain, connect, and expand domain knowledge autonomously. Claims it is the easiest way to run a long-term autoresearch agent so far, building agents with genuine domain expertise rather than starting from scratch each session.
#8
@brendanh0gan
https://x.com/brendanh0gan/status/2039774609348640947
Built AlphaLab as a parallel effort to Karpathy's auto-research. The key difference: a real research phase before touching a GPU, plus self-adaptation and massive parallel experimentation with synthesis. Says something changed with these models in December 2025, a phase change in agentic coding ability that makes this kind of autonomous research loop actually viable now.
https://x.com/brendanh0gan/status/2039774609348640947
Built AlphaLab as a parallel effort to Karpathy's auto-research. The key difference: a real research phase before touching a GPU, plus self-adaptation and massive parallel experimentation with synthesis. Says something changed with these models in December 2025, a phase change in agentic coding ability that makes this kind of autonomous research loop actually viable now.
#9
@sharat_sc
https://x.com/sharat_sc/status/2039756265799258176
Tried OpenClaw after a demo at the Boston meetup. Describes it as auto-research-management rather than auto-research-a-la-Karpathy. Good for finding related work and structuring your project, which is the organizational layer that pure optimization loops miss. A different slice of the same problem: not just running experiments, but knowing which experiments to run.
https://x.com/sharat_sc/status/2039756265799258176
Tried OpenClaw after a demo at the Boston meetup. Describes it as auto-research-management rather than auto-research-a-la-Karpathy. Good for finding related work and structuring your project, which is the organizational layer that pure optimization loops miss. A different slice of the same problem: not just running experiments, but knowing which experiments to run.
#10
@MartinSzerment
https://x.com/MartinSzerment/status/2039624608710598811
Shanghai AI Lab just beat Nano Banana 2 in image generation using a 6B model called GEMS. The trick is wrapping the model in an agent loop that iterates, remembers, and reloads domain knowledge. Intelligence emerging from architecture, not scale. Another data point that small-model-plus-loop can punch above its weight class.
https://x.com/MartinSzerment/status/2039624608710598811
Shanghai AI Lab just beat Nano Banana 2 in image generation using a 6B model called GEMS. The trick is wrapping the model in an agent loop that iterates, remembers, and reloads domain knowledge. Intelligence emerging from architecture, not scale. Another data point that small-model-plus-loop can punch above its weight class.
#11
@azeem
https://x.com/azeem/status/2039829529120694489
Adapted Karpathy's autoresearch for knowledge work beyond code. Science is the most reliable knowledge-production method we have, and autonomous experimental loops can run at near-zero marginal cost. But the hard part is not the automation itself. It is measurement. If you cannot define a clear signal for what good looks like, the loop optimizes noise.
https://x.com/azeem/status/2039829529120694489
Adapted Karpathy's autoresearch for knowledge work beyond code. Science is the most reliable knowledge-production method we have, and autonomous experimental loops can run at near-zero marginal cost. But the hard part is not the automation itself. It is measurement. If you cannot define a clear signal for what good looks like, the loop optimizes noise.
#12
@jorcagra
https://x.com/jorcagra/status/2039601361612890344
The /loop plus --agent combo in Claude Code is underrated because it spawns a specialized daemon with its own system prompt, not just base Claude. The missing piece is persistent memory across loop ticks. Right now each fire restarts cold. With memory, iterative self-improvement loops would actually compound across runs, making autoresearch a native primitive rather than a workaround.
https://x.com/jorcagra/status/2039601361612890344
The /loop plus --agent combo in Claude Code is underrated because it spawns a specialized daemon with its own system prompt, not just base Claude. The missing piece is persistent memory across loop ticks. Right now each fire restarts cold. With memory, iterative self-improvement loops would actually compound across runs, making autoresearch a native primitive rather than a workaround.
#13
@chris_karani
https://x.com/chris_karani/status/2039685336796668032
Released an on-device memory engine with MCP and CLI tool support, built specifically for long coding sessions. Finds it invaluable when running auto-research loops for six-plus hours because the agent can persist context across tool calls without losing track of what it already tried. Solves the cold-restart problem that limits overnight autoresearch runs.
https://x.com/chris_karani/status/2039685336796668032
Released an on-device memory engine with MCP and CLI tool support, built specifically for long coding sessions. Finds it invaluable when running auto-research loops for six-plus hours because the agent can persist context across tool calls without losing track of what it already tried. Solves the cold-restart problem that limits overnight autoresearch runs.
π‘ Eco Products Radar
Eco Products Radar
Claude Code β The dominant runtime for autoresearch. Shows up as the execution engine behind trading bots, sales scrapers, research loops, and knowledge base agents. Its agentic loop architecture is what makes long autonomous runs possible.
Karpathy's Autoresearch β The framework that started the wave. Now being forked and adapted for non-ML use cases including sales, knowledge work, and model divergence testing. The benchmark results against Optuna give it empirical legitimacy.
Optuna β The traditional Bayesian optimization baseline that autoresearch was tested against. Still a solid tool for hyperparameter tuning, but the head-to-head experiments show autoresearch pulling ahead on sample efficiency and generalization.
Claude Code β The dominant runtime for autoresearch. Shows up as the execution engine behind trading bots, sales scrapers, research loops, and knowledge base agents. Its agentic loop architecture is what makes long autonomous runs possible.
Karpathy's Autoresearch β The framework that started the wave. Now being forked and adapted for non-ML use cases including sales, knowledge work, and model divergence testing. The benchmark results against Optuna give it empirical legitimacy.
Optuna β The traditional Bayesian optimization baseline that autoresearch was tested against. Still a solid tool for hyperparameter tuning, but the head-to-head experiments show autoresearch pulling ahead on sample efficiency and generalization.
Comments