ARIS: SJTU's Open-Source Answer to Autonomous ML Research
ARIS dropped on arXiv May 4 from Shuai Li's lab at Shanghai Jiao Tong University, then climbed to #1 on Hugging Face Daily Papers with 68 upvotes. It's an open-source research harness for autonomous ML research, and the entry-point insight is the failure mode it targets: not visible breakdown, but plausible unsupported success. Long-running agents produce confident-sounding claims whose evidence is incomplete or silently inherited from how the executor framed the task. ARIS is a system designed to catch that.
The mechanism is cross-model adversarial collaboration. An executor model drives the work forward. A reviewer from a different model family is scheduled to critique intermediate artifacts and demand revisions. Three layers — execution (65+ Markdown skills, MCP integrations, persistent research wiki, deterministic figure generation), orchestration (five end-to-end workflows with adjustable effort settings), assurance (three-stage verification pipeline). The reviewer sitting outside the executor's family is the load-bearing piece. Same-family critique drifts toward shared blind spots; different-family critique forces the disagreements to surface.
Why this matters for the agent thesis: autonomous research is one of the cleanest long-horizon agent applications, and the rest of the field has been racing to build vertical research agents (Sakana, Anthropic's Claude Research, StanfordPaperBench teams). ARIS is the first open-source harness centering its design on the disagreement-as-truth-discovery pattern. It pairs cleanly with the Skills movement (Anthropic Skills, addyosmani/agent-skills) — ARIS treats skills as first-class research tools, not chat-prompts.
Project page: https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep — paper: https://arxiv.org/abs/2605.03042
← Back to all articles
The mechanism is cross-model adversarial collaboration. An executor model drives the work forward. A reviewer from a different model family is scheduled to critique intermediate artifacts and demand revisions. Three layers — execution (65+ Markdown skills, MCP integrations, persistent research wiki, deterministic figure generation), orchestration (five end-to-end workflows with adjustable effort settings), assurance (three-stage verification pipeline). The reviewer sitting outside the executor's family is the load-bearing piece. Same-family critique drifts toward shared blind spots; different-family critique forces the disagreements to surface.
Why this matters for the agent thesis: autonomous research is one of the cleanest long-horizon agent applications, and the rest of the field has been racing to build vertical research agents (Sakana, Anthropic's Claude Research, StanfordPaperBench teams). ARIS is the first open-source harness centering its design on the disagreement-as-truth-discovery pattern. It pairs cleanly with the Skills movement (Anthropic Skills, addyosmani/agent-skills) — ARIS treats skills as first-class research tools, not chat-prompts.
Project page: https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep — paper: https://arxiv.org/abs/2605.03042
Comments