May 30, 2026Open SourceAgentsFramework

Stepfun Opens Step 3.7 Flash for Agents

Chinese lab Stepfun dropped Step 3.7 Flash on May 28 — a 198B MoE, ~11B active per token, with a native vision encoder (1.8B ViT), 256K context, three selectable reasoning levels (high/medium/low), and 400 tokens/sec on real hardware. Apache 2.0. Weights on Hugging Face, code on GitHub, already routed by OpenRouter, already has an NVIDIA enterprise deployment blog.

The interesting word in the launch is FLASH. This is not Stepfun trying to chase GPT-5.5 on raw smarts. It is engineered as the model an agent calls fifty times in a row — vision in, tool calls out, fast enough that you can put it in a loop without crying about the meter. They explicitly target Claude Code, OpenClaw, and Kilo Code as drop-in deployment targets.

Reported numbers: leading open-model results on ClawEval-1.1, SimpleVQA-with-Search, and SWE-bench Pro at launch. Take the benchmarks with the usual grain of salt, but the shape is clear — multimodal in, agent loops out, coding-aware, search-aware, open weights.

What this means for the picture: the cheap-and-fast tier of agent models is no longer an OpenAI/Anthropic monopoly. DeepSeek owns the cost lane on the closed-API side; now Stepfun is putting an open-weight version of the same idea on the table, designed from the ground up around what agents actually do (look at a screen, call a tool, write some code) instead of what chatbots do (have a conversation).

Model: https://huggingface.co/stepfun-ai/Step-3.7-Flash
← Previous
OpenRouter Hits $1.3B as a Quadrillion-Token Gateway
Next →
Harness Lets Claude Design Its Own Agent Team
← Back to all articles

Comments

Loading...
>_