Loop Daily: June 24, 2026
The loop conversation split cleanly today between people doing the work and people hyping it. The signal lived in the builders: someone running nightly auto-research on a 27B model and only paying for a frontier review at the end, a fully automated research lab that replans the second a baseline dies at 3am, a self-improving framework honest enough to admit the lift is a tenth of a point, an autonomous software factory that ships releases of its own plugins. Underneath the demos a real architecture argument is forming β that memory is the foundation self-improvement compounds on, that the agentic loop is teachable from first principles on a tiny toolkit, and that you assemble context by searching for it, not waiting for it. And the Sakana/Fugu autoresearch milestone got the skeptical reading it needed, with sharp breakdowns showing how much of the "orchestration beats one model" claim is really just routing, and what it costs. The takeaway holds from last week: the loop is the product, but a loop without verification and memory is just expensive guessing.
#1
@nasqret
https://x.com/nasqret/status/2069151257143718052
Auto-research loops have become his default way of working with agents, and once you learn to interact with a problem this way it's hard to go back to normal prompting. He's blunt about the tradeoff: it's an expensive way to work the models, but when depth and discovery are the priority, loops win. It's the cleanest first-person statement of the day on why people are abandoning one-shot prompts, not because loops are trendy, but because the quality gap is real once the task rewards exploration.
https://x.com/nasqret/status/2069151257143718052
Auto-research loops have become his default way of working with agents, and once you learn to interact with a problem this way it's hard to go back to normal prompting. He's blunt about the tradeoff: it's an expensive way to work the models, but when depth and discovery are the priority, loops win. It's the cleanest first-person statement of the day on why people are abandoning one-shot prompts, not because loops are trendy, but because the quality gap is real once the task rewards exploration.
#2
@matthias_meyer_
https://x.com/matthias_meyer_/status/2068847869419794739
Self-improving agents sound like snake oil, and he agrees the hard part was never getting an agent to rewrite its own prompt, it's doing that in production without it quietly getting worse. So he built Darwin, a TypeScript MIT framework that ports GEPA (reflective prompt evolution) into a real production loop: a new prompt variant goes live only if it beats the current one in an always-valid sequential A/B test and survives an alignment guard, otherwise it rolls back. The newest piece is a drift canary that compares recent tool-call trajectories against a frozen baseline to catch behavior changes a score gate would miss. The honest kicker: the lift from evolution is real but small, about a tenth of a point on a ten scale, measured on his own agent fleet, not a benchmark demo.
https://x.com/matthias_meyer_/status/2068847869419794739
Self-improving agents sound like snake oil, and he agrees the hard part was never getting an agent to rewrite its own prompt, it's doing that in production without it quietly getting worse. So he built Darwin, a TypeScript MIT framework that ports GEPA (reflective prompt evolution) into a real production loop: a new prompt variant goes live only if it beats the current one in an always-valid sequential A/B test and survives an alignment guard, otherwise it rolls back. The newest piece is a drift canary that compares recent tool-call trajectories against a frozen baseline to catch behavior changes a score gate would miss. The honest kicker: the lift from evolution is real but small, about a tenth of a point on a ten scale, measured on his own agent fleet, not a benchmark demo.
#3
@MLCatttt
https://x.com/MLCatttt/status/2069163208641015854
Months before this wave, they built freephdlabor, a fully automated research lab where you hand it a rough idea and the agents take it end to end: form hypotheses, write and run experiments, replan the moment a baseline dies at 3am, and leave you a draft to tear apart. The key design choice is that you customize the whole lab for your own domain, with no fixed pipeline jamming an LLM into each box, which they argue is exactly where the current wave of auto-research systems is still stuck. It's open source and built on smolagents. A real, runnable example of the overnight autonomous research loop everyone is now talking about.
https://x.com/MLCatttt/status/2069163208641015854
Months before this wave, they built freephdlabor, a fully automated research lab where you hand it a rough idea and the agents take it end to end: form hypotheses, write and run experiments, replan the moment a baseline dies at 3am, and leave you a draft to tear apart. The key design choice is that you customize the whole lab for your own domain, with no fixed pipeline jamming an LLM into each box, which they argue is exactly where the current wave of auto-research systems is still stuck. It's open source and built on smolagents. A real, runnable example of the overnight autonomous research loop everyone is now talking about.
#4
@Skiipy88
https://x.com/Skiipy88/status/2069141486281597029
A concrete recipe for autonomous long work that doesn't need a frontier model: a session of nightly auto-research on any subject is incredibly effective on a 27B model. His safeguard for trust is simple, if you don't trust the output yet, ask your favorite SOTA model to review it, and you've still saved 95%+ of the usage. It's a practical answer to the cost panic everyone's feeling, run the expensive loop on a cheap local model overnight, and pay for the frontier only at the verification step.
https://x.com/Skiipy88/status/2069141486281597029
A concrete recipe for autonomous long work that doesn't need a frontier model: a session of nightly auto-research on any subject is incredibly effective on a 27B model. His safeguard for trust is simple, if you don't trust the output yet, ask your favorite SOTA model to review it, and you've still saved 95%+ of the usage. It's a practical answer to the cost panic everyone's feeling, run the expensive loop on a cheap local model overnight, and pay for the frontier only at the verification step.
#5
@techmeat
https://x.com/techmeat/status/2069023940660375596
Dark Factory is an autonomous software factory: drop an idea into a channel and a team of agents scopes it, builds it, reviews it, and ships it to a live URL while you sleep. It runs on Nous Research's Hermes Agent with two plugins he built on top, Hermes Workflows (your dev process packed into reusable workflows) and Open Second Brain (shared memory across every agent). The self-improving twist is that the same pipeline ships releases of the plugins themselves. It's one of the most complete end-to-end overnight-build loops shared this week, and notably it's built on Hermes rather than Claude Code.
https://x.com/techmeat/status/2069023940660375596
Dark Factory is an autonomous software factory: drop an idea into a channel and a team of agents scopes it, builds it, reviews it, and ships it to a live URL while you sleep. It runs on Nous Research's Hermes Agent with two plugins he built on top, Hermes Workflows (your dev process packed into reusable workflows) and Open Second Brain (shared memory across every agent). The self-improving twist is that the same pipeline ships releases of the plugins themselves. It's one of the most complete end-to-end overnight-build loops shared this week, and notably it's built on Hermes rather than Claude Code.
#6
@lazyvibecoderx
https://x.com/lazyvibecoderx/status/2069029140423221699
He lays out a three-priority stack for a self-improving agent, and the ordering itself is the insight. First was agent memory, fully local and SOTA; second is a wiki "second brain" that goes well beyond the standard Karpathy-style Obsidian wrappers everyone is shipping right now; third is the harness proper, self-replicating and self-evolving per use case, improving over time. It's a more serious architecture than the typical "I built a loop" post, treating memory as the foundation the self-improvement actually compounds on.
https://x.com/lazyvibecoderx/status/2069029140423221699
He lays out a three-priority stack for a self-improving agent, and the ordering itself is the insight. First was agent memory, fully local and SOTA; second is a wiki "second brain" that goes well beyond the standard Karpathy-style Obsidian wrappers everyone is shipping right now; third is the harness proper, self-replicating and self-evolving per use case, improving over time. It's a more serious architecture than the typical "I built a loop" post, treating memory as the foundation the self-improvement actually compounds on.
#7
@a_g_e_n_c
https://x.com/a_g_e_n_c/status/2068925099634561438
The new AgenC core is almost ready, and he's careful to say what it isn't: not a marketplace, but an open-source, self-improving agent framework with a built-in code editor. It's the kind of project worth tracking precisely because it bundles the self-improvement loop with an editing surface, so the agent has somewhere to actually act on what it learns. Shipped under tetsuo-ai/agenc-core.
https://x.com/a_g_e_n_c/status/2068925099634561438
The new AgenC core is almost ready, and he's careful to say what it isn't: not a marketplace, but an open-source, self-improving agent framework with a built-in code editor. It's the kind of project worth tracking precisely because it bundles the self-improvement loop with an editing surface, so the agent has somewhere to actually act on what it learns. Shipped under tetsuo-ai/agenc-core.
#8
@ximihoque
https://x.com/ximihoque/status/2069175767641272598
On day 13 of 30 shipping a tool called xysq, he introduced xysq-goal, an agentic loop that gathers all the relevant context for a goal. The interesting mechanism is that it proactively queries like an A* search across your vault's memories and across teams, rather than waiting for you to hand it context. It's a concrete, in-progress build of the "agent that assembles its own context" idea, framed as a search problem instead of a retrieval afterthought.
https://x.com/ximihoque/status/2069175767641272598
On day 13 of 30 shipping a tool called xysq, he introduced xysq-goal, an agentic loop that gathers all the relevant context for a goal. The interesting mechanism is that it proactively queries like an A* search across your vault's memories and across teams, rather than waiting for you to hand it context. It's a concrete, in-progress build of the "agent that assembles its own context" idea, framed as a search problem instead of a retrieval afterthought.
#9
@wisnuanugrahp
https://x.com/wisnuanugrahp/status/2069013078163980679
A clean learning build: combining Instructions plus Memory plus Tools, he built an assistant that doesn't just reply but reasons and executes, calling it The Agentic Loop. The stack is refreshingly small, Python, uv for package management, and minsearch. It's a good reminder that the agentic loop is teachable from first principles on a minimal toolkit, not just something you buy as a frontier-lab feature.
https://x.com/wisnuanugrahp/status/2069013078163980679
A clean learning build: combining Instructions plus Memory plus Tools, he built an assistant that doesn't just reply but reasons and executes, calling it The Agentic Loop. The stack is refreshingly small, Python, uv for package management, and minsearch. It's a good reminder that the agentic loop is teachable from first principles on a minimal toolkit, not just something you buy as a frontier-lab feature.
#10
@mdambock
https://x.com/mdambock/status/2069170930014658868
Finished the Agentic RAG module of a DataTalksClub LLM course with real implementation work, not just theory. He ported the workflow from OpenAI to the Google GenAI SDK on gemini-2.5-flash, implemented sliding-window chunking that dropped tokens 3x, and used native function calling for the agentic loop. It's a grounded, reproducible example of building the retrieve-reason-act loop yourself and optimizing it for cost along the way.
https://x.com/mdambock/status/2069170930014658868
Finished the Agentic RAG module of a DataTalksClub LLM course with real implementation work, not just theory. He ported the workflow from OpenAI to the Google GenAI SDK on gemini-2.5-flash, implemented sliding-window chunking that dropped tokens 3x, and used native function calling for the agentic loop. It's a grounded, reproducible example of building the retrieve-reason-act loop yourself and optimizing it for cost along the way.
#11
@JC_builds
https://x.com/JC_builds/status/2068934746395312623
He's sharp about what makes his on-device builder a real agentic loop instead of a one-shot: write, run on-device, read the error, make a targeted edit, run again. The detail that matters is the failure handling, if the build fails it patches the broken line rather than rewriting the whole thing, and retries until it passes. It's a concrete look at loop mechanics where the iteration is grounded in actually running the code on a real device, not guessing.
https://x.com/JC_builds/status/2068934746395312623
He's sharp about what makes his on-device builder a real agentic loop instead of a one-shot: write, run on-device, read the error, make a targeted edit, run again. The detail that matters is the failure handling, if the build fails it patches the broken line rather than rewriting the whole thing, and retries until it passes. It's a concrete look at loop mechanics where the iteration is grounded in actually running the code on a real device, not guessing.
#12
@ItsMrLin
https://x.com/ItsMrLin/status/2069182583486005647
Presenting LILO at ICML 2026, he frames it as a principled auto-research-style harness built on Bayesian optimization. The division of labor is the idea: LLMs surface deep prior knowledge and capture language feedback, while Bayesian optimization turns that into uncertainty-aware search and exploration. It's a more rigorous take on the auto-research loop than the typical demo, grounding the "keep exploring until confident" instinct in an actual optimization framework.
https://x.com/ItsMrLin/status/2069182583486005647
Presenting LILO at ICML 2026, he frames it as a principled auto-research-style harness built on Bayesian optimization. The division of labor is the idea: LLMs surface deep prior knowledge and capture language feedback, while Bayesian optimization turns that into uncertainty-aware search and exploration. It's a more rigorous take on the auto-research loop than the typical demo, grounding the "keep exploring until confident" instinct in an actual optimization framework.
#13
@whoisanku
https://x.com/whoisanku/status/2068963729719365958
A clear-eyed technical breakdown of Harvey's new Fugu and Fugu Ultra systems, via Elie Bakouch, that punctures some of the autoresearch hype. Standard Fugu is essentially a router/classifier picking a model per turn, reportedly scoring ten points lower than Claude 3 Opus on SWE-bench Pro; Fugu Ultra is an orchestrator using test-time compute scaling but limited to five steps because it must predict the whole workflow at t=0 rather than dynamically. He flags that Harvey's benchmarks omit output token counts and costs, and that the auto-research benchmark compares against anonymized "Model A, B, C" instead of named frontier models. It's the skeptical reading the autoresearch milestone needed.
https://x.com/whoisanku/status/2068963729719365958
A clear-eyed technical breakdown of Harvey's new Fugu and Fugu Ultra systems, via Elie Bakouch, that punctures some of the autoresearch hype. Standard Fugu is essentially a router/classifier picking a model per turn, reportedly scoring ten points lower than Claude 3 Opus on SWE-bench Pro; Fugu Ultra is an orchestrator using test-time compute scaling but limited to five steps because it must predict the whole workflow at t=0 rather than dynamically. He flags that Harvey's benchmarks omit output token counts and costs, and that the auto-research benchmark compares against anonymized "Model A, B, C" instead of named frontier models. It's the skeptical reading the autoresearch milestone needed.
#14
@MaaSonder
https://x.com/MaaSonder/status/2069045489195266474
A grounded primer on Sakana AI as a research lab betting on self-improvement and orchestration rather than the next chatbot. The standout is the Darwin Godel Machine, an AI that edits its own code to improve, reportedly jumping from 20% to 50% on SWE-bench, which is much closer to a self-improving agent loop than a normal LLM. He's honest about the limits: the idea-to-experiment-to-paper pipeline is fully automated but domain-constrained, and one output only reached a workshop, not a top-tier venue. A useful, non-hype framing of where automated research actually stands.
https://x.com/MaaSonder/status/2069045489195266474
A grounded primer on Sakana AI as a research lab betting on self-improvement and orchestration rather than the next chatbot. The standout is the Darwin Godel Machine, an AI that edits its own code to improve, reportedly jumping from 20% to 50% on SWE-bench, which is much closer to a self-improving agent loop than a normal LLM. He's honest about the limits: the idea-to-experiment-to-paper pipeline is fully automated but domain-constrained, and one output only reached a workshop, not a top-tier venue. A useful, non-hype framing of where automated research actually stands.
#15
@amihai
https://x.com/amihai/status/2069030815892123993
A clean dissection of Brilliant's agentic loop that's worth studying as a design pattern. At a high level the agents have a fixed list of tools they can call, export, lookup, execute_commands, load_knowledge and more, alongside dynamically available tools like generate_image and vectorize that appear when relevant. It's a concrete look at how a real product structures its loop around a tool registry plus context-dependent capabilities, rather than the usual hand-wave about "agents with tools".
https://x.com/amihai/status/2069030815892123993
A clean dissection of Brilliant's agentic loop that's worth studying as a design pattern. At a high level the agents have a fixed list of tools they can call, export, lookup, execute_commands, load_knowledge and more, alongside dynamically available tools like generate_image and vectorize that appear when relevant. It's a concrete look at how a real product structures its loop around a tool registry plus context-dependent capabilities, rather than the usual hand-wave about "agents with tools".
π‘ Eco Products Radar
Eco Products Radar
Sakana / Fugu β the week's autoresearch flashpoint; orchestration platform and the Darwin Godel Machine self-improvement line, repeatedly analyzed for both its 14-hour AutoResearch demo and its routing-vs-real-capability caveats.
Hermes Agent β Nous Research's harness increasingly chosen as the base for autonomous build loops (Dark Factory) over Claude Code, paired with custom workflow and shared-memory plugins.
Claude Code β still the default harness people wrap their loops around, with /loops and dynamic workflows the recurring reference point.
GEPA β reflective prompt-evolution method being ported into production self-improvement loops (Darwin) with A/B gates and rollback.
smolagents β the open-source agent framework under fully automated research labs like freephdlabor.
Sakana / Fugu β the week's autoresearch flashpoint; orchestration platform and the Darwin Godel Machine self-improvement line, repeatedly analyzed for both its 14-hour AutoResearch demo and its routing-vs-real-capability caveats.
Hermes Agent β Nous Research's harness increasingly chosen as the base for autonomous build loops (Dark Factory) over Claude Code, paired with custom workflow and shared-memory plugins.
Claude Code β still the default harness people wrap their loops around, with /loops and dynamic workflows the recurring reference point.
GEPA β reflective prompt-evolution method being ported into production self-improvement loops (Darwin) with A/B gates and rollback.
smolagents β the open-source agent framework under fully automated research labs like freephdlabor.
Comments