Nemotron 3 Ultra: NVIDIA's 550B Open-Weight Model Was Built for Agents, Not Chat
Jensen Huang announced it at Computex on June 1. NVIDIA's Nemotron 3 Ultra has 550 billion parameters and goes fully open-weight on June 4—available through Hugging Face, ModelScope, OpenRouter, build.nvidia.com, and major cloud partners. The Nemotron 3 family spans Nano, Super, and Ultra sizes, so developers can match model to compute. Ultra is the one designed specifically for long-running agentic tasks: planning, tool calling, file inspection, sustained code generation, maintaining state across long chains of operations.
On Artificial Analysis' Intelligence Index, it scores 48—the top American open-weight model by a wide margin, though China still leads among open models globally. Pre-release testing on DeepInfra showed over 300 tokens per second, which matters for agents that need to iterate fast.
The strategic logic is straightforward: NVIDIA makes its money on silicon, and the more developers build agents with open Nemotron models, the more NVIDIA chips get bought for inference. Open weights for agent infrastructure is how NVIDIA stays relevant even after the training compute wave peaks. Full announcement: https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models
← Back to all articles
On Artificial Analysis' Intelligence Index, it scores 48—the top American open-weight model by a wide margin, though China still leads among open models globally. Pre-release testing on DeepInfra showed over 300 tokens per second, which matters for agents that need to iterate fast.
The strategic logic is straightforward: NVIDIA makes its money on silicon, and the more developers build agents with open Nemotron models, the more NVIDIA chips get bought for inference. Open weights for agent infrastructure is how NVIDIA stays relevant even after the training compute wave peaks. Full announcement: https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models
Comments