Moonshot ships KVV: prove your inference vendor is not lying
Alongside K2.6, Moonshot quietly open-sourced something more important than most people will notice. Kimi Vendor Verifier, KVV for short. It is a benchmark suite that lets you check whether the K2.6 endpoint your provider is serving you actually behaves like K2.6.
The trigger was real pain. After K2 Thinking shipped, Moonshot saw widespread benchmark anomalies that turned out to be infrastructure bugs, not model bugs. Open weights mean anyone can serve the model. They cannot all serve it correctly. KVV runs six checks: pre-flight parameter constraints, OCRBench smoke test, MMMU Pro for vision input preprocessing, AIME 2025 for long-output stress, K2VV ToolCall for tool consistency and JSON accuracy, and SWE-Bench for agentic coding evaluation.
The license is MIT, the supported vendors include Kimi own API plus open-source deployments via vLLM, SGLang, and KTransformers. Anyone can run it against any endpoint claiming to be K2.6 and find out whether tool calls return malformed JSON, whether long outputs get truncated, whether thinking mode actually thinks.
This matters because the open-weight ecosystem hits a trust wall once models become dependable enough for production. Moonshot is taking the trust problem seriously and giving the community the tooling to keep their providers honest. Anthropic and OpenAI never have to publish vendor verifiers because they own the inference. Open labs do.
Link github.com/MoonshotAI/Kimi-Vendor-Verifier
← Back to all articles
The trigger was real pain. After K2 Thinking shipped, Moonshot saw widespread benchmark anomalies that turned out to be infrastructure bugs, not model bugs. Open weights mean anyone can serve the model. They cannot all serve it correctly. KVV runs six checks: pre-flight parameter constraints, OCRBench smoke test, MMMU Pro for vision input preprocessing, AIME 2025 for long-output stress, K2VV ToolCall for tool consistency and JSON accuracy, and SWE-Bench for agentic coding evaluation.
The license is MIT, the supported vendors include Kimi own API plus open-source deployments via vLLM, SGLang, and KTransformers. Anyone can run it against any endpoint claiming to be K2.6 and find out whether tool calls return malformed JSON, whether long outputs get truncated, whether thinking mode actually thinks.
This matters because the open-weight ecosystem hits a trust wall once models become dependable enough for production. Moonshot is taking the trust problem seriously and giving the community the tooling to keep their providers honest. Anthropic and OpenAI never have to publish vendor verifiers because they own the inference. Open labs do.
Link github.com/MoonshotAI/Kimi-Vendor-Verifier
Comments