NVIDIA GB200 NVL72

Type: Platform Tags: NVIDIA, GPU, Hardware, NVLink, Blackwell, Data Center, Rack-Scale, HPC, AI Related: NVIDIA-Blackwell-Architecture, NVIDIA-DGX-B200, NVIDIA-DGX-SuperPOD-GB200-RA, NVIDIA-GB300-NVL72, NVIDIA-DGX-B300, NVIDIA-Accelerated-Quantum-Center, NVIDIA-DGX-Quantum, NVLink, NVIDIA-DGX, NVIDIA-DGX-SuperPOD, NVIDIA-Mission-Control, NVIDIA-Quantum-InfiniBand, NVIDIA-Spectrum-X, NVIDIA-Certified-Storage, NCCL, NVIDIA-Grace-CPU, NVIDIA-Hopper-Architecture, TensorRT-LLM, Megatron-LM Sources: NVIDIA official product page (live fetch 2026-04-10; updated from https://www.nvidia.com/en-us/data-center/gb200-nvl72/, https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/index.html, https://www.nvidia.com/en-us/data-center/gb300-nvl72/) Last Updated: 2026-05-09

Summary

The NVIDIA GB200 NVL72 is a rack-scale, liquid-cooled AI supercomputer housing 72 Blackwell GPUs and 36 Grace CPUs connected via fifth-generation NVLink. It delivers 1,440 PFLOPS of FP4 AI performance and is designed for trillion-parameter LLM training and inference at exascale. It claims 30x faster LLM inference and 25x better energy efficiency vs. the H100 generation.

Detail

Purpose

Provides rack-scale, all-to-all GPU connectivity for workloads that exceed what a single-node multi-GPU server can handle — primarily trillion-parameter model training, large-scale inference, and HPC simulations.

Key Specifications

GPU count: 72 NVIDIA Blackwell Tensor Core GPUs
CPU count: 36 NVIDIA Grace CPUs (Arm Neoverse V2)
GPU memory: 13.4 TB HBM3E total (372 GB per Grace Blackwell Superchip)
GPU memory bandwidth: 576 TB/s total
CPU memory: 17 TB LPDDR5X @ 14 TB/s
NVLink bandwidth: 130 TB/s (5th-gen NVLink)
Form factor: Rack-scale, liquid-cooled
Interconnect: NVLink-C2C (Grace CPU ↔ Blackwell GPU), NVLink Switch fabric (all 72 GPUs)

Performance (Sparse / Dense)

Precision	Performance
NVFP4	1,440 PFLOPS / 720 PFLOPS
FP8 / FP6	720 PFLOPS / 360 PFLOPS
INT8	720 POPS / 360 POPS
FP16 / BF16	360 PFLOPS / 180 PFLOPS
FP32	5,760 TFLOPS
FP64	2,880 TFLOPS

Key Features

Second-generation Transformer Engine with FP4 precision support
NVLink-C2C interconnect: 900 GB/s CPU-to-GPU bandwidth per Superchip
130 TB/s low-latency all-to-all GPU communication fabric
Dedicated decompression engines for database and data processing workloads
Liquid cooling for extreme power density

Performance Claims vs. H100

30x faster real-time trillion-parameter LLM inference
4x faster LLM training at scale
25x better energy efficiency vs. H100 air-cooled infrastructure
18x faster database queries vs. CPU

Use Cases

Trillion-parameter LLM training (GPT-4 scale and beyond)
Real-time trillion-parameter LLM inference
Mixture-of-Experts (MoE) model training and serving
High-performance computing (HPC) and simulation
Large-scale data analytics

Target Customers

Cloud service providers, hyperscalers, national labs, and enterprises running frontier AI models.

Hardware Requirements

Requires liquid cooling infrastructure
Rack-scale deployment (full rack)
NVIDIA NVLink Switch System

Reference architecture placement

NVIDIA-DGX-SuperPOD-GB200-RA is the canonical wiki page for the DGX SuperPOD reference architecture based on GB200 NVL72 racks. That RA uses eight DGX GB200 systems per scalable unit, NDR 400 Gb/s InfiniBand compute fabric, Spectrum-4/Spectrum-X Ethernet for storage and in-band management, NVIDIA-Mission-Control, NVIDIA-AI-Enterprise, and certified high-performance storage.

Connections

NVIDIA-Blackwell-Architecture — built entirely on Blackwell B200 GPUs
NVIDIA-DGX-B200 — node-level Blackwell DGX system adjacent to GB200 NVL72.
NVIDIA-DGX-SuperPOD-GB200-RA — DGX SuperPOD reference architecture for GB200 rack-scale deployments.
NVIDIA-GB300-NVL72 — Blackwell Ultra successor rack-scale NVL72 system.
NVIDIA-DGX-B300 — current DGX Blackwell Ultra system direction.
NVIDIA-Accelerated-Quantum-Center - NVAQC uses GB200 NVL72-class systems for accelerated quantum computing research.
NVIDIA-DGX-Quantum - DGX Quantum architecture context for quantum-classical systems.
NVLink — 5th-gen NVLink provides the 130 TB/s all-to-all fabric
NVIDIA-Grace-CPU — 36 Grace CPUs paired via NVLink-C2C
NVIDIA-DGX — DGX GB200 is the branded version of this platform
NVIDIA-DGX-SuperPOD — SuperPOD architecture around GB200 NVL72.
NVIDIA-Mission-Control — operations and orchestration layer in the GB200 SuperPOD RA.
NVIDIA-Quantum-InfiniBand and NVIDIA-Spectrum-X — compute and storage/management fabrics in the GB200 SuperPOD RA.
NVIDIA-Certified-Storage — certified high-performance storage is required to feed GB200 SuperPOD workloads.
TensorRT-LLM — primary inference software stack for this hardware
Megatron-LM — primary training framework for trillion-parameter models
NCCL — collective communications library used across the 72 GPUs
NVIDIA-Hopper-Architecture — predecessor architecture (H100/H200)

AIPS BOOM

Explorer

NVIDIA-GB200-NVL72