OpenGame turns a prompt into a playable web game
CUHK MMLab just dropped OpenGame, an open-source agentic framework that takes a one-line prompt and builds a finished, playable web game. Not a wireframe, not a demo. Six fully playable games shipped with the April 21 release: a Marvel platformer, a Harry Potter card-battle, a KOF fighter, a cat tower defense, a Star Wars shooter, and a Squid Game adaptation.
The whole thing runs on GameCoder-27B, a code model they trained with continual pre-training, supervised fine-tuning, and execution-grounded RL on real game engines. Around it sits Game Skill β a Template Skill that grows a library of project skeletons from past projects, plus a Debug Skill that maintains a running protocol of verified fixes. You drop a prompt in, the agent picks the closest skeleton, writes the game, runs it headless in a browser, and only counts it done when the build is healthy and a VLM judges that the visuals match the intent.
This is the part that matters. They built OpenGame-Bench, an evaluation pipeline that scores agentic game generation on Build Health, Visual Usability, and Intent Alignment. Across 150 diverse game prompts they claim a new SOTA. A bench plus a code model plus a skill library plus an agent loop β that is a more complete recipe than ninety percent of the agent papers shipping right now.
GitHub at github.com/leigest519/OpenGame, paper at arxiv.org/abs/2604.18394. Apache-2.0. If you want to see what end-to-end agentic creation actually looks like outside of "write me a TODO app," this is the cleanest example so far.
← Back to all articles
The whole thing runs on GameCoder-27B, a code model they trained with continual pre-training, supervised fine-tuning, and execution-grounded RL on real game engines. Around it sits Game Skill β a Template Skill that grows a library of project skeletons from past projects, plus a Debug Skill that maintains a running protocol of verified fixes. You drop a prompt in, the agent picks the closest skeleton, writes the game, runs it headless in a browser, and only counts it done when the build is healthy and a VLM judges that the visuals match the intent.
This is the part that matters. They built OpenGame-Bench, an evaluation pipeline that scores agentic game generation on Build Health, Visual Usability, and Intent Alignment. Across 150 diverse game prompts they claim a new SOTA. A bench plus a code model plus a skill library plus an agent loop β that is a more complete recipe than ninety percent of the agent papers shipping right now.
GitHub at github.com/leigest519/OpenGame, paper at arxiv.org/abs/2604.18394. Apache-2.0. If you want to see what end-to-end agentic creation actually looks like outside of "write me a TODO app," this is the cleanest example so far.
Comments