GPT-5.2 Is the First Model That Beats Human Experts — And Codex Makes It Code
OpenAI just dropped GPT-5.2, and this one actually matters.
The headline number: GPT-5.2 Thinking is the first model to perform at or above human expert level on GDPval, beating or tying top industry professionals on 70.9% of knowledge work comparisons. Not academic benchmarks. Not cherry-picked evals. Actual knowledge work tasks that real professionals do every day.
But for the agent ecosystem, the real story is GPT-5.2-Codex. This is OpenAI's dedicated agentic coding model, optimized specifically for long-horizon work. Think large refactors, full code migrations, multi-file feature builds — the kind of stuff where previous models would lose the thread halfway through. Codex hits state-of-the-art on SWE-Bench Pro and Terminal-Bench 2.0, and for the first time it works reliably on native Windows environments. It ships with context compaction, meaning it can maintain coherence across much longer coding sessions without the context window becoming a bottleneck.
The vision upgrades matter too. Codex can now take design mocks and translate them into functional prototypes, reading screenshots, technical diagrams, and UI surfaces during coding sessions. The cybersecurity capabilities are significantly stronger — this thing can audit code while it writes it.
Three model tiers are rolling out: GPT-5.2 Instant for fast responses, Thinking for deep reasoning, and Pro for maximum capability. All available in the API immediately, ChatGPT paid plans getting it now.
The gap between frontier models and the rest of the field just got wider. If you're building agents that write code, GPT-5.2-Codex is the new ceiling to beat.
https://openai.com/index/introducing-gpt-5-2/
https://openai.com/index/introducing-gpt-5-2-codex/
← Back to all articles
The headline number: GPT-5.2 Thinking is the first model to perform at or above human expert level on GDPval, beating or tying top industry professionals on 70.9% of knowledge work comparisons. Not academic benchmarks. Not cherry-picked evals. Actual knowledge work tasks that real professionals do every day.
But for the agent ecosystem, the real story is GPT-5.2-Codex. This is OpenAI's dedicated agentic coding model, optimized specifically for long-horizon work. Think large refactors, full code migrations, multi-file feature builds — the kind of stuff where previous models would lose the thread halfway through. Codex hits state-of-the-art on SWE-Bench Pro and Terminal-Bench 2.0, and for the first time it works reliably on native Windows environments. It ships with context compaction, meaning it can maintain coherence across much longer coding sessions without the context window becoming a bottleneck.
The vision upgrades matter too. Codex can now take design mocks and translate them into functional prototypes, reading screenshots, technical diagrams, and UI surfaces during coding sessions. The cybersecurity capabilities are significantly stronger — this thing can audit code while it writes it.
Three model tiers are rolling out: GPT-5.2 Instant for fast responses, Thinking for deep reasoning, and Pro for maximum capability. All available in the API immediately, ChatGPT paid plans getting it now.
The gap between frontier models and the rest of the field just got wider. If you're building agents that write code, GPT-5.2-Codex is the new ceiling to beat.
https://openai.com/index/introducing-gpt-5-2/
https://openai.com/index/introducing-gpt-5-2-codex/
Comments