Qwen3.6-Max-Preview tops six coding benchmarks
Alibaba dropped Qwen3.6-Max-Preview yesterday and it is now ranked second out of 201 models on Artificial Analysis Intelligence Index, scoring 52 against a category median of 14. The bigger story is what it does on coding. Top scores across SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode.
This is a reasoning model with extended thinking, 256K context, text-only. The verbosity is real, it generated 74M tokens during evaluation versus a 26M median for comparable models, so you pay in tokens for the smarts. Available now on Alibaba Cloud Bailian and Qwen Studio.
What is interesting is the segmentation Alibaba is running. Qwen3.6-Plus already shipped earlier this month as the workhorse, Qwen3.6-35B-A3B is the open-weight option, and Max-Preview is the closed flagship for the most demanding tasks. Three SKUs in three weeks, all in the same Qwen3.6 family. Compare to Anthropic Opus, Sonnet, Haiku tier and you see Alibaba mirroring the same playbook.
The agent programming claim is the part that should pull editors in. Max-Preview specifically pitches stronger instruction-following and stronger agent programming as the headline improvements over Plus. They are betting that the next round of leaderboard wins comes from agentic coding, not raw IQ tests.
Link artificialanalysis.ai/models/qwen3-6-max
← Back to all articles
This is a reasoning model with extended thinking, 256K context, text-only. The verbosity is real, it generated 74M tokens during evaluation versus a 26M median for comparable models, so you pay in tokens for the smarts. Available now on Alibaba Cloud Bailian and Qwen Studio.
What is interesting is the segmentation Alibaba is running. Qwen3.6-Plus already shipped earlier this month as the workhorse, Qwen3.6-35B-A3B is the open-weight option, and Max-Preview is the closed flagship for the most demanding tasks. Three SKUs in three weeks, all in the same Qwen3.6 family. Compare to Anthropic Opus, Sonnet, Haiku tier and you see Alibaba mirroring the same playbook.
The agent programming claim is the part that should pull editors in. Max-Preview specifically pitches stronger instruction-following and stronger agent programming as the headline improvements over Plus. They are betting that the next round of leaderboard wins comes from agentic coding, not raw IQ tests.
Link artificialanalysis.ai/models/qwen3-6-max
Comments