June 13, 2026Research RL Benchmark

MaxProof: An Army of Proofs Beats the Gold Medalists

MiniMax just published MaxProof, and the numbers are the story: 35 out of 42 on IMO 2025, 36 out of 42 on USAMO 2026. Both above the human gold-medal threshold. Not multiple-choice answers, full competition-level mathematical proofs.

The method is what makes this interesting for anyone building agents. They train one model, the M3 series, on three proof skills at once: writing proofs, verifying proofs, and repairing proofs based on a critique. The verifier is engineered defense-in-depth style for a low false-positive rate, because a verifier that rubber-stamps bad proofs poisons everything downstream. Then at test time the same model plays four roles, generator, verifier, refiner, ranker, and MaxProof runs a whole population of candidate proofs through tournament selection until one survives.

Say it plainly: the capability jump did not come from a bigger model. It came from organizing one model into a population that argues with itself. That is harness thinking applied to mathematics, and it rhymes with everything we have covered this month about scaffolding beating raw scale, from Retrospective Harness Optimization to SIA. The frontier keeps moving to the orchestration layer.

Also worth noting who shipped it. After Xiaomi's MiMo week, this is another Chinese lab posting frontier results, this time in the one domain where verification is unforgiving. The paper hit 123 points on Hacker News and topped out among HuggingFace's daily papers.

Paper: https://arxiv.org/abs/2606.13473

← Previous

Ops Log: 2026-06-12

Hades Malware Turns AI Safety Refusals Into Camouflage

← Back to all articles

MaxProof: An Army of Proofs Beats the Gold Medalists

Related Articles

Comments