NVIDIA DGX Systems

Type: Platform Tags: NVIDIA, hardware, HPC, AI supercomputer, DGX, data center, training, infrastructure Related: NVIDIA-Base-Command, NVIDIA-Base-Command-Manager, NVIDIA-Bright-Cluster-Manager, NVIDIA-BaseOS, NVIDIA-DGX-Cloud, NVIDIA-DGX-SuperPOD, NVIDIA-DGX-BasePOD, NVIDIA-DGX-BasePOD-B200-H200-H100-RA, NVIDIA-DGX-B200, NVIDIA-DGX-SuperPOD-B200-RA, NVIDIA-GB200-NVL72, NVIDIA-DGX-SuperPOD-GB200-RA, NVIDIA-DGX-B300, NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA, NVIDIA-DGX-SuperPOD-B300-Quantum-X800-InfiniBand-RA, NVIDIA-DGX-Spark, NVIDIA-DGX-Station, NVIDIA-DGX-Quantum, NVIDIA-DGX-Enterprise-Support, NVIDIA-GB300-NVL72, NVIDIA-Certified-Systems, NVIDIA-Data-Center-CPUs, NVIDIA-Cloud-Accelerator-NCX, NVIDIA-Blackwell-Architecture, NVIDIA-Vera-Rubin, NVIDIA-Vera-Rubin-POD, NVIDIA-Hopper-Architecture, NVLink, NCCL, NVIDIA-MIG, NVIDIA-GPU-Operator, NVIDIA-Optimized-Frameworks, NVIDIA-Resiliency-Extension, NVIDIA-AI-Enterprise, NVIDIA-Enterprise-Licensing-Guide Sources: NVIDIA official documentation (live fetch attempted 2026-04-10; updated from https://www.nvidia.com/en-us/data-center/dgx-b200/, https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-b200/latest/index.html, https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/index.html, https://www.nvidia.com/en-us/data-center/dgx-b300/, https://www.nvidia.com/en-us/data-center/gb300-nvl72/, https://docs.nvidia.com/dgx-superpod/reference-architecture/scalable-infrastructure-b300/latest/index.html, https://docs.nvidia.com/dgx-superpod/reference-architecture/scalable-infrastructure-b300-xdr/latest/index.html, https://www.nvidia.com/en-us/products/workstations/dgx-spark/, https://www.nvidia.com/en-us/products/workstations/dgx-station/, https://docs.nvidia.com/dgx-basepod/index.html, https://www.nvidia.com/en-us/data-center/dgx-support/, https://docs.nvidia.com/deeplearning/frameworks/index.html) Last Updated: 2026-05-09

Summary

NVIDIA DGX systems are purpose-built AI supercomputers and infrastructure platforms integrating NVIDIA GPUs, NVLink interconnects, high-bandwidth memory, networking, DGX OS/BaseOS, and NVIDIA AI software into validated systems for AI training, inference, and development. The DGX family now spans personal AI systems such as NVIDIA-DGX-Spark, deskside systems such as NVIDIA-DGX-Station, data center systems such as NVIDIA-DGX-B200 and NVIDIA-DGX-B300, rack-scale systems such as NVIDIA-GB200-NVL72 and NVIDIA-GB300-NVL72, enterprise reference architectures such as NVIDIA-DGX-BasePOD and NVIDIA-DGX-SuperPOD, and cloud delivery through NVIDIA-DGX-Cloud.

Detail

Purpose

Training large foundation models (LLMs, multi-modal models, scientific AI) at scale requires not just powerful GPUs, but tightly integrated GPU-to-GPU communication fabric, validated software stacks, and production-grade reliability. Assembling these components independently is complex and time-consuming. DGX systems provide a validated, out-of-the-box AI computing platform where all components (GPUs, NVLink, NVSwitch, InfiniBand, storage, software) are integrated, tested, and supported by NVIDIA — reducing time-to-training and operational risk.

Key Features

Current DGX Systems and platforms (as of 2026):

  • DGX Spark: compact GB10 Grace Blackwell desktop AI computer for local model development, fine-tuning, inference, data science, edge prototyping, and local agent work
  • DGX Station: GB300 Grace Blackwell Ultra deskside AI supercomputer with 748 GB coherent memory, NVLink-C2C, ConnectX-8 networking, MIG partitioning, and optional RTX PRO GPU support
  • DGX H100: 8× H100 SXM5 (80 GB HBM3) GPUs; 640 GB total GPU memory; 4th-gen NVLink + NVSwitch for all-to-all 900 GB/s GPU bandwidth; 2× ConnectX-7 InfiniBand for multi-node scaling; 10 kW power
  • DGX H200: 8× H200 SXM5 (141 GB HBM3e) GPUs; 1.1 TB total GPU memory — optimized for LLM inference and large-model training that benefits from bigger memory footprint
  • DGX B200: 8x Blackwell GPUs; 1,440 GB total HBM3e memory; 14.4 TB/s aggregate NVLink bandwidth; ConnectX-7 networking and BlueField-3 DPUs; Blackwell DGX platform for AI factory develop-to-deploy pipelines
  • DGX B300: current Blackwell Ultra DGX generation connected to NVIDIA-DGX-B300 and NVIDIA-GB300-NVL72 rack-scale guidance
  • GB200 NVL72: rack-scale system with 72 Blackwell GPUs and 36 Grace CPUs connected via NVLink 5; designed as a single, liquid-cooled AI supercomputer unit with 130 TB/s rack-scale NVLink bandwidth
  • DGX Station A100/H100: Workstation-class systems for small-team or on-premises development
  • DGX SuperPOD: Multi-rack clusters of DGX nodes connected via InfiniBand NDR fabric; scales from ~20 to 1000s of nodes; used for pre-training frontier models; “AI data center in a box”
  • DGX BasePOD: prescriptive DGX reference architecture for enterprise AI infrastructure below SuperPOD scale; current RA covers DGX B200, H200, and H100 with NDR400 InfiniBand
  • DGX Quantum: DGX-branded quantum-classical architecture identity; current en-US navigation redirects toward NVIDIA-NVQLink
  • DGX Cloud: NVIDIA-managed DGX infrastructure on Oracle Cloud, Azure, GCP, and AWS; per-node/per-hour rental of full DGX pods; includes NVIDIA AI Enterprise software
  • DGX Enterprise Support: support, infrastructure services, and training layer for DGX systems, BasePOD, and SuperPOD

Key System Capabilities:

  • NVLink/NVSwitch Fabric: All 8 GPUs in a DGX node are fully connected via NVSwitch, enabling any-to-any GPU communication at line rate — critical for tensor parallelism in LLM training
  • NVIDIA AI Enterprise Bundle: DGX systems ship with Base Command Manager (cluster OS), NGC access, and AI Enterprise software as standard; current NVIDIA-Enterprise-Licensing-Guide guidance distinguishes Hopper DGX bundle inclusion from Blackwell DGX systems that require separate AI Enterprise licenses
  • Validated Storage Integration: Certified with VAST Data, WekaFS, DDN EXAScaler, and NetApp for high-throughput model checkpoint storage
  • Validated AI factory ecosystem: DGX deployments connect to NVIDIA-Certified-Systems, NVIDIA-Bright-Cluster-Manager, NVIDIA-Data-Center-CPUs, and NVIDIA-Cloud-Accelerator-NCX guidance for broader data center infrastructure.

Use Cases

  • Pre-training LLMs and multimodal foundation models at scale (GPT-4 class, Llama family, Nemotron)
  • Large-scale scientific AI: climate modeling, molecular dynamics, drug discovery simulation
  • High-throughput LLM inference serving at enterprise scale using DGX H200 or GB200 NVL72
  • AI research labs requiring dense GPU compute without public cloud cost/latency concerns
  • Enterprise “AI factory” deployment: dedicated on-premises AI infrastructure under DGX SuperPOD architecture
  • Edge-to-cloud AI development: DGX Station for local development, DGX SuperPOD for production training

Hardware Requirements / Compatibility

  • DGX H100: 2× Intel Xeon Platinum CPUs; 2 TB DDR5 RAM; 30 TB NVMe SSD; Ubuntu 22.04 + DGX OS
  • DGX B200: 2x Intel Xeon Platinum CPUs; HBM3e GPU memory; NVLink 5 + NVSwitch 4
  • Power: 10–14.3 kW per DGX node; requires 3-phase power; liquid cooling optional/required for B200 class
  • Networking: 8× ConnectX-7 (400 Gb/s InfiniBand NDR or 400GbE) network cards per node for inter-node scaling
  • OS: DGX OS (Ubuntu-based, customized); Base Command Manager as Kubernetes cluster OS for SuperPOD

Language Bindings / APIs

  • DGX is a hardware platform; software APIs are those of the installed frameworks:
    • CUDA, cuDNN, NCCL — GPU programming and communication
    • NGC CLI — container and model management, including NVIDIA-Optimized-Frameworks images
    • Base Command CLI (ngc bc) — job scheduling and cluster management
    • DCGM REST API — GPU health and telemetry

Connections

Resources