Nemotron 3 Super

Type: Model / NIM microservice Tags: NVIDIA, Nemotron, LLM, reasoning, agentic AI, coding, long context, MoE, Mamba, NVFP4, NIM, NeMo, training recipes Related: Nemotron, Nemotron-Training-Recipes, Nemotron-3-Nano, Nemotron-3-Nano-Omni, NVIDIA-NIM, NIM-for-Large-Language-Models, NVIDIA-NeMo, NeMo-Megatron-Bridge, NeMo-AutoModel, NeMo-RL, NeMo-Gym, NeMo-Run, NeMo-Data-Designer, NeMo-Evaluator, NVIDIA-Agent-Intelligence-Toolkit, NVIDIA-NemoClaw, NVIDIA-OpenShell, TensorRT-LLM, vLLM, Megatron-LM, NVIDIA-AI-Q-Blueprint, NVIDIA-Data-Flywheel-Blueprint Sources: https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b/modelcard; https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/; https://docs.nvidia.com/nemotron/latest/nemotron/super3/README.html; https://docs.nvidia.com/nemotron/latest/nemotron/super3/pretrain.html; https://docs.nvidia.com/nemotron/latest/nemotron/super3/sft.html; https://docs.nvidia.com/nemotron/latest/nemotron/super3/rl/index.html; https://docs.nvidia.com/nemo/gym/latest/index.html; https://docs.nvidia.com/nemotron/latest/usage-cookbook/Nemotron-3-Super/OpenScaffoldingResources/README.html; https://docs.nvidia.com/nemo/megatron-bridge/latest/models/llm/nemotron3-super.html; https://developer.nvidia.com/nemotron Last Updated: 2026-04-29

Summary

Nemotron 3 Super is NVIDIA’s open 120B-total, 12B-active reasoning LLM for complex agentic workloads, long-context reasoning, coding, planning, tool use, and collaborative multi-agent systems. NVIDIA positions it as the higher-capacity member of the Nemotron 3 family, complementary to Nemotron-3-Nano for targeted execution and Nemotron-3-Nano-Omni for omnimodal perception.

Detail

Purpose

Nemotron 3 Super targets the high-capacity side of agentic AI: tasks where the model must sustain long context, reason over code or retrieved documents, plan across tool calls, and avoid goal drift over multi-step workflows. NVIDIA’s launch material frames Super as a way to reduce the “thinking tax” in multi-agent systems by using an efficient MoE architecture rather than sending every task to a dense frontier-scale model.

Model profile

  • Total parameters: 120B.
  • Active parameters: 12B.
  • Context window: up to 1M tokens in NVIDIA’s public model material and Nemotron usage docs.
  • Architecture: hybrid Mamba-Transformer / latent mixture-of-experts design with Mamba-2 layers, MoE layers, selective attention layers, and Multi-Token Prediction.
  • NVIDIA reports open weights, datasets, training recipes, deployment cookbooks, fine-tuning cookbooks, and evaluation recipes around the model.
  • The model is distributed through build.nvidia.com and Hugging Face, and NVIDIA describes it as packaged as NVIDIA-NIM.

Training and tooling

NVIDIA’s technical blog says Super was pretrained on 25T tokens, underwent supervised fine-tuning over a larger post-training corpus, and was post-trained with multi-environment reinforcement learning using NeMo-RL and NeMo-Gym. The same material points to NeMo-Data-Designer, NeMo-Evaluator, and open recipes for reproducibility.

NeMo-Megatron-Bridge has a current Nemotron 3 Super page for Hugging Face/Megatron conversion, pretraining examples, full-parameter fine-tuning, LoRA, PTQ/QAT quantization, and exporting quantized checkpoints back to Hugging Face. Current Bridge docs call out B200/NVFP4-oriented training paths and expert-parallel MoE settings for large-scale work.

Nemotron-Training-Recipes adds the public cookbook layer: Stage 0 pretraining with diversity, quality, and long-context phases; Stage 1 SFT with a two-stage output-loss procedure and continued MTP training; Stage 2 RL with multi-environment RLVR, SWE-RL, and RLHF; plus evaluation and FP8/NVFP4 quantization surfaces.

Agent and coding workflows

The Nemotron usage cookbook includes a current “Nemotron 3 Super with Agentic Coding Tools” page. It shows Super with agentic coding tools such as OpenCode, OpenClaw, Kilo Code CLI, and OpenHands through OpenRouter/build.nvidia.com-style access. The page uses model references such as nvidia/nemotron-3-super and nvidia/nemotron-3-super-120b-a12b, with a 1M context and large output-token limits in examples.

Use this page for the Super reasoning model. Use Nemotron-3-Nano for the smaller text-only Nemotron 3 reasoning model, Nemotron-3-Nano-Omni for the omnimodal VLM/audio/video model, Nemotron for the full family, and NIM-for-Large-Language-Models for the generic LLM NIM serving surface.

Connections

Source Excerpts

  • “120B total, 12B active-parameter model”
  • “native 1M-token context window”
  • “Nemotron 3 Super with Agentic Coding Tools”

Resources