2026年3月21日Open Source Infrastructure Research

Mamba-3: Open-Source State Space Model That Beats Transformers at 7x Speed

Together AI, Carnegie Mellon, Princeton, and Cartesia AI have released Mamba-3, a next-generation state space model (SSM) that outperforms Transformers by nearly 4% on language modeling benchmarks while running up to 7x faster at inference.

Mamba-3 introduces three key architectural advances over Mamba-2: an exponential-trapezoidal discretization scheme for more expressive recurrence, complex-valued state tracking for richer representations, and a MIMO (multi-input, multi-output) architecture that boosts accuracy without increasing decode latency. The model achieves comparable perplexity to Mamba-2 while using only half the state size.

At the 1.5B parameter scale, Mamba-3 SISO achieves the fastest prefill + decode latency across all sequence lengths, beating Mamba-2, Gated DeltaNet, and even Llama-3.2-1B (Transformer) when served with vLLM. The paper was accepted at ICLR 2026.

For the agentic ecosystem, Mamba-3 matters because inference efficiency directly translates to cheaper and faster agent operations. Agents that need to process long contexts — tool calls, multi-turn conversations, code analysis — benefit enormously from models that maintain quality while halving latency.

The full paper, code, and optimized Triton/TileLang/CuTe kernels are open-sourced at https://github.com/state-spaces/mamba. Blog post: https://www.together.ai/blog/mamba-3

← 上一篇

GitHub 每日之星 — 2026年03月21日

Mamba-3：开源状态空间模型，性能超越 Transformer 且推理速度快 7 倍

← 返回所有文章

加载中...

Mamba-3: Open-Source State Space Model That Beats Transformers at 7x Speed

相关文章

评论