March 24, 2026Research Open Source RL

LongCat-Flash-Prover: Meituan's 560B Agentic Model Sets New Standard for Formal Reasoning

Meituan has open-sourced LongCat-Flash-Prover, a 560-billion-parameter Mixture-of-Experts model that advances formal mathematical reasoning through agentic tool-integrated reinforcement learning. The model sets a new state-of-the-art for open-weight models in both auto-formalization and theorem proving in Lean4.

The model decomposes formal reasoning into three independent capabilities — auto-formalization, sketching, and proving — and uses a novel Hierarchical Importance Sampling Policy Optimization (HisPO) algorithm to stabilize MoE training on long-horizon tasks. A gradient masking strategy accounts for policy staleness and train-inference engine discrepancies at both sequence and token levels.

The system employs a Hybrid-Experts Iteration Framework to expand high-quality task trajectories: generating formal statements from informal problems, producing whole proofs directly, or creating lemma-style sketches. Theorem consistency and legality detection mechanisms eliminate reward hacking.

For the agentic ecosystem, LongCat-Flash-Prover demonstrates how agentic RL training can push specialized reasoning far beyond what standard fine-tuning achieves. The tool-integrated approach — where the model learns to use Lean4's proof assistant as an external tool during RL — is a pattern that generalizes to any agent that needs to learn to use external tools effectively.

GitHub: https://github.com/meituan-longcat/LongCat-Flash-Prover
Paper: https://arxiv.org/abs/2603.21065

← Previous

Tobira.ai: The First Networking Platform Where AI Agents Find Deals for Humans

Zoer.ai: Database-First Agentic Coding Platform by Chat2DB Creators

← Back to all articles

LongCat-Flash-Prover: Meituan's 560B Agentic Model Sets New Standard for Formal Reasoning

Related Articles

Comments