NVIDIA DGX SuperPOD

Type: Platform Tags: NVIDIA, DGX SuperPOD, AI supercomputer, cluster, data center, InfiniBand, Spectrum-X, DGX B200, DGX B300, GB200, AI factory Related: NVIDIA-DGX, NVIDIA-DGX-BasePOD, NVIDIA-DGX-BasePOD-B200-H200-H100-RA, NVIDIA-DGX-Enterprise-Support, NVIDIA-DGX-B200, NVIDIA-DGX-SuperPOD-B200-RA, NVIDIA-GB200-NVL72, NVIDIA-DGX-SuperPOD-GB200-RA, NVIDIA-DGX-B300, NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA, NVIDIA-DGX-SuperPOD-B300-Quantum-X800-InfiniBand-RA, NVIDIA-GB300-NVL72, NVIDIA-Vera-Rubin, NVIDIA-Vera-Rubin-POD, NVIDIA-DGX-Cloud, NVIDIA-Enterprise-AI-Factory, NVIDIA-Mission-Control, NVIDIA-Certified-Storage, NVIDIA-AI-Data-Platform, NVIDIA-Base-Command-Manager, NVIDIA-BaseOS, NVIDIA-ConnectX-InfiniBand, NVIDIA-ConnectX-9, NVIDIA-Quantum-InfiniBand, NVIDIA-Quantum-X800-InfiniBand, NVIDIA-Spectrum-X, NVIDIA-Spectrum-X-Validated-Solution-Stack Sources: https://docs.nvidia.com/dgx-superpod/index.html, https://www.nvidia.com/en-us/data-center/dgx-superpod/, https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-b200/latest/index.html, https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/index.html, https://docs.nvidia.com/dgx-superpod/reference-architecture/scalable-infrastructure-b300/latest/index.html, https://docs.nvidia.com/dgx-superpod/reference-architecture/scalable-infrastructure-b300-xdr/latest/index.html, https://www.nvidia.com/en-us/data-center/dgx-b200/, https://www.nvidia.com/en-us/data-center/gb300-nvl72/, https://www.nvidia.com/en-us/data-center/technologies/rubin/, https://docs.nvidia.com/dgx-basepod/index.html, https://www.nvidia.com/en-us/data-center/dgx-support/ Last Updated: 2026-05-09

Summary

NVIDIA DGX SuperPOD is NVIDIA’s reference AI supercomputing platform for large-scale training and AI factory deployments. It combines DGX compute systems, high-performance networking, storage, validated software, and operational guidance into a scalable cluster architecture. Current wiki coverage now separates the B200 node-based RA, GB200 rack-scale RA, and B300 Blackwell Ultra RA variants so customer conversations can distinguish node-scale, rack-scale NVLink, Ethernet, and InfiniBand design choices.

Detail

Purpose

Frontier AI training and high-throughput enterprise AI workloads require more than individual GPU servers. DGX SuperPOD packages compute, network, storage, software, and operational design into a system-level architecture.

Key capabilities

Scalable DGX-based AI supercomputing clusters.
Integration with NVIDIA networking such as InfiniBand and Spectrum-X class infrastructure.
Validated system software, OS, and management components.
Target platform for large LLM training, model customization, simulation, and AI factory workloads.

Current reference architectures

NVIDIA-DGX-SuperPOD-B200-RA covers the DGX B200 SuperPOD document with 32 DGX B200 systems per scalable unit and NDR400 InfiniBand compute fabric.
NVIDIA-DGX-SuperPOD-GB200-RA covers the rack-scale DGX GB200 SuperPOD document with eight GB200 NVL72 racks per scalable unit, NVLink 5, NDR400 InfiniBand, and Spectrum-4/Spectrum-X Ethernet.
NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA covers the DGX B300 SuperPOD document focused on Spectrum-4/Spectrum-X Ethernet and DC busbar power.
NVIDIA-DGX-SuperPOD-B300-Quantum-X800-InfiniBand-RA covers the DGX B300 SuperPOD document focused on Quantum-X800/XDR InfiniBand switching and AC power.
Both position NVIDIA-Mission-Control as the standard operations stack for DGX SuperPOD with DGX B300, with NVIDIA-AI-Enterprise, NGC, Run:ai/Base Command Manager functionality, and NVIDIA Enterprise Support in the software/support path.

NVIDIA context

DGX SuperPOD is a major anchor for many wiki topics: NCCL, NVIDIA-ConnectX-InfiniBand, NVLink, NVIDIA-BaseOS, NVIDIA-DCGM, NVIDIA-Base-Command-Manager, and NVIDIA-Mission-Control. Current AI factory guidance also makes storage and enterprise data access first-class design concerns, linking SuperPOD-scale compute to NVIDIA-Certified-Storage, NVIDIA-AI-Data-Platform, and agentic AI infrastructure. NVIDIA-DGX-BasePOD is the smaller prescriptive enterprise reference architecture that often precedes or complements SuperPOD-scale deployments, while NVIDIA-DGX-Enterprise-Support covers support, onboarding, administration, and infrastructure services for SuperPOD operations.

Connections

NVIDIA-DGX - DGX systems are the building blocks.
NVIDIA-DGX-BasePOD and NVIDIA-DGX-BasePOD-B200-H200-H100-RA - BasePOD is the enterprise reference architecture below SuperPOD scale.
NVIDIA-DGX-Enterprise-Support - support/services layer for SuperPOD deployment and administration.
NVIDIA-DGX-B200 - Blackwell DGX system that precedes the DGX B300 generation.
NVIDIA-DGX-SuperPOD-B200-RA - B200 SuperPOD RA with 32-system scalable units and NDR400 InfiniBand.
NVIDIA-GB200-NVL72 and NVIDIA-DGX-SuperPOD-GB200-RA - rack-scale Grace Blackwell SuperPOD design with NVLink 5.
NVIDIA-DGX-B300 - Blackwell Ultra DGX generation for current AI factory deployments.
NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA - B300 SuperPOD RA with Spectrum-4/Spectrum-X and DC busbar focus.
NVIDIA-DGX-SuperPOD-B300-Quantum-X800-InfiniBand-RA - B300 SuperPOD RA with Quantum-X800 InfiniBand and AC power focus.
NVIDIA-GB300-NVL72 - rack-scale Blackwell Ultra NVL72 system in the DGX/SuperPOD direction.
NVIDIA-Vera-Rubin and NVIDIA-Vera-Rubin-POD - next-generation platform and POD-scale architecture beyond Blackwell.
NVIDIA-Enterprise-AI-Factory - enterprise AI factory guidance generalizes DGX/SuperPOD patterns into validated full-stack designs.
NVIDIA-Certified-Storage - validated storage is needed to feed large SuperPOD-scale workloads.
NVIDIA-AI-Data-Platform - data platform reference designs connect SuperPOD-scale compute to retrieval and agent data workflows.
NVIDIA-ConnectX-InfiniBand, NVIDIA-Quantum-InfiniBand, and NVIDIA-Quantum-X800-InfiniBand - high-performance InfiniBand networking for scale-out training.
NVIDIA-ConnectX-9 - next-generation 1.6 Tb/s-class adapter for future AI factory fabrics.
NVIDIA-Spectrum-X-Validated-Solution-Stack - validated software/firmware stack for Spectrum-X AI factory fabrics.
NCCL - multi-node collectives are central to training at SuperPOD scale.
NVIDIA-BaseOS - validated OS layer for DGX infrastructure.
NVIDIA-Mission-Control - AI factory operations layer for large NVIDIA deployments.

Source Excerpts

NVIDIA DGX SuperPOD docs provide the documentation entry point for NVIDIA’s scalable AI supercomputing platform.
B200 and GB200 SuperPOD reference architectures are separate docs and are represented as separate wiki pages.
Current DGX B300 SuperPOD reference architectures split into Spectrum-4/DC-busbar and Quantum-X800/AC-power designs.

AIPS BOOM

Explorer

NVIDIA-DGX-SuperPOD

NVIDIA DGX SuperPOD

Summary

Detail

Purpose

Key capabilities

Current reference architectures

NVIDIA context

Connections

Source Excerpts

Resources

Graph View

Table of Contents

Backlinks