An Agent That Rewrites Both Its Code and Its Brain
There have been two camps in self-improving agents. One says keep the model frozen and let a meta-agent rewrite the harness, the tools, prompts, retry logic, search procedure. The other says keep the harness fixed and use RL to update the model's weights on task feedback. SIA, open-sourced by Hexo Labs, says why not both, in one loop. A Feedback-Agent updates the harness AND the weights of a task-specific agent, together.
The results are why this paper just rocketed to 775 upvotes on Hugging Face, by far the top paper of the day, even though it quietly dropped two weeks ago and is only catching fire now. SIA-W+H beats prior state of the art by 25.1% on Chinese legal charge classification, writes GPU kernels 12.4% faster than the previous best, 1,017 versus 1,161 microseconds, and improves single-cell RNA denoising by 20.4%. Three completely unrelated domains, law, systems, biology, one self-improvement framework, state of the art in all three.
Why it matters: this is the cleanest evidence yet that the harness-versus-weights debate was a false choice. The recursive-self-improvement thread keeps thickening. Anthropic reporting Claude writes 80% of its own code, MLEvolve out-evolving AlphaEvolve, and now SIA showing an agent can improve its scaffold and its parameters in the same loop and win across domains that share nothing.
The honest caveat is the same as always with a surging old paper, the work is two weeks old, the spike is new attention, not a new result. But the result holds up, and it's open. Repo at github.com/hexo-ai/sia, paper at arxiv.org/abs/2605.27276.
← Back to all articles
The results are why this paper just rocketed to 775 upvotes on Hugging Face, by far the top paper of the day, even though it quietly dropped two weeks ago and is only catching fire now. SIA-W+H beats prior state of the art by 25.1% on Chinese legal charge classification, writes GPU kernels 12.4% faster than the previous best, 1,017 versus 1,161 microseconds, and improves single-cell RNA denoising by 20.4%. Three completely unrelated domains, law, systems, biology, one self-improvement framework, state of the art in all three.
Why it matters: this is the cleanest evidence yet that the harness-versus-weights debate was a false choice. The recursive-self-improvement thread keeps thickening. Anthropic reporting Claude writes 80% of its own code, MLEvolve out-evolving AlphaEvolve, and now SIA showing an agent can improve its scaffold and its parameters in the same loop and win across domains that share nothing.
The honest caveat is the same as always with a surging old paper, the work is two weeks old, the spike is new attention, not a new result. But the result holds up, and it's open. Repo at github.com/hexo-ai/sia, paper at arxiv.org/abs/2605.27276.
Comments