May 10, 2026ResearchAgentsBenchmark

DeepMind AI Co-Mathematician Hits 48% on FrontierMath Tier 4

Google DeepMind dropped a paper called AI Co-Mathematician on arXiv yesterday. The headline number — 48% on FrontierMath Tier 4. That is the hardest tier of FrontierMath, where the previous public best was around 30%. New SOTA among all AI systems evaluated.

Eighteen authors. Pushmeet Kohli, Alex Davies (the AlphaProof people). Fernanda Viegas and Martin Wattenberg from the Google AI visualization side. Daniel Roy on the theory side. This is not a single-team submission, this is the whole DeepMind agentic-research stack pointing at one target.

What it actually is: an interactive workbench for working mathematicians. Five capabilities woven together — ideation, literature search, computational exploration, theorem proving, theory building. Asynchronous workspace, manages uncertainty across long-running threads, produces native mathematical artifacts (LaTeX, Lean, plots) instead of plain text. The agentic loop is the product, not the answer.

The framing matters. Most agent-for-research demos are pitched as "AI does the research." This one is explicit — accelerate the human mathematician. Long-horizon tasks where the human stays in the loop, the AI handles the boring search/compute/proof-attempt work, and the verification step is concrete (Lean checks, computational results) rather than vibe-checked.

arxiv.org/abs/2605.06651, 22 pages. No code yet. Pairs with AlphaEvolve, AlphaProof, and the Auto Research with Specialist Agents paper from last week — DeepMind is now shipping research-agent systems on something like a monthly cadence. Frontier mathematicians using AI as a literal copilot is no longer aspirational, it is a 48% benchmark.
← Previous
Deep Dive: Autoresearch went from demo to production economics in one quarter
Next →
Google's Chrome DevTools MCP Just Cracked Open Third-Party Tools
← Back to all articles

Comments

Loading...
>_