TL;DR: NVIDIA has launched Nemotron 3 Ultra, a 550B hybrid Mamba-Transformer MoE open model optimized to run agentic reasoning workflows up to five times faster.
Summary: NVIDIA has released Nemotron 3 Ultra, a 550B parameter Mixture-of-Experts (MoE) open model designed for long-running autonomous agents. The model features a hybrid Mamba-Transformer MoE architecture that enables more reasoning cycles within the same time budget. It delivers up to 5x faster inference and 30% lower costs for complex tasks like coding, deep research, and enterprise orchestration.
Why it matters: This open-weights model lowers the barrier to deploying highly capable, low-latency local or hosted orchestrators for complex agentic workflows. Developers should evaluate Nemotron 3 Ultra on Fireworks AI to test its planning and failure-recovery performance.
Source: @NVIDIAAI