June 2, 2026Open SourceInfrastructure

Nemotron 3 Ultra: NVIDIA's 550B Open-Weight Model Was Built for Agents, Not Chat

Jensen Huang announced it at Computex on June 1. NVIDIA's Nemotron 3 Ultra has 550 billion parameters and goes fully open-weight on June 4—available through Hugging Face, ModelScope, OpenRouter, build.nvidia.com, and major cloud partners. The Nemotron 3 family spans Nano, Super, and Ultra sizes, so developers can match model to compute. Ultra is the one designed specifically for long-running agentic tasks: planning, tool calling, file inspection, sustained code generation, maintaining state across long chains of operations.

On Artificial Analysis' Intelligence Index, it scores 48—the top American open-weight model by a wide margin, though China still leads among open models globally. Pre-release testing on DeepInfra showed over 300 tokens per second, which matters for agents that need to iterate fast.

The strategic logic is straightforward: NVIDIA makes its money on silicon, and the more developers build agents with open Nemotron models, the more NVIDIA chips get bought for inference. Open weights for agent infrastructure is how NVIDIA stays relevant even after the training compute wave peaks. Full announcement: https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models
← Previous
MAI-Thinking-1: 1 Trillion Parameters, 35B Active, Beats Claude Sonnet 4.6 Head-to-Head
Next →
headroom: Compress Everything Before It Reaches the LLM, 60-95% Smaller
← Back to all articles

Comments

Loading...
>_