April 20, 2026Agents Coding Benchmark

Qwen3.6-Max-Preview tops six coding benchmarks

Alibaba dropped Qwen3.6-Max-Preview yesterday and it is now ranked second out of 201 models on Artificial Analysis Intelligence Index, scoring 52 against a category median of 14. The bigger story is what it does on coding. Top scores across SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode.

This is a reasoning model with extended thinking, 256K context, text-only. The verbosity is real, it generated 74M tokens during evaluation versus a 26M median for comparable models, so you pay in tokens for the smarts. Available now on Alibaba Cloud Bailian and Qwen Studio.

What is interesting is the segmentation Alibaba is running. Qwen3.6-Plus already shipped earlier this month as the workhorse, Qwen3.6-35B-A3B is the open-weight option, and Max-Preview is the closed flagship for the most demanding tasks. Three SKUs in three weeks, all in the same Qwen3.6 family. Compare to Anthropic Opus, Sonnet, Haiku tier and you see Alibaba mirroring the same playbook.

The agent programming claim is the part that should pull editors in. Max-Preview specifically pitches stronger instruction-following and stronger agent programming as the headline improvements over Plus. They are betting that the next round of leaderboard wins comes from agentic coding, not raw IQ tests.

Link artificialanalysis.ai/models/qwen3-6-max

← Previous

Moonshot ships KVV: prove your inference vendor is not lying

Dune is a three-key remote control for your AI agents

← Back to all articles

Qwen3.6-Max-Preview tops six coding benchmarks

More Articles

Comments