NVIDIA Spectrum-X

Type: Technology Tags: NVIDIA, Spectrum-X, Ethernet, AI Networking, RoCE, Lossless Ethernet, HPC Networking Related: NVIDIA-Spectrum-X-Validated-Solution-Stack, NVIDIA-Cumulus-Linux, NVIDIA-Network-Operator, NVIDIA-DOCA, NVIDIA-DOCA-OFED, DOCA-Flow, DOCA-PCC, DOCA-Telemetry-Service, OVS-DOCA, NVIDIA-MLNX-EN, NVIDIA-HPC-X, NVIDIA-NetQ, NVIDIA-DSX-Air, NVIDIA-STX, NVIDIA-CMX, NVIDIA-AI-Data-Platform, NVIDIA-Enterprise-Reference-Architectures, NVIDIA-RTX-PRO-AI-Factory, NVIDIA-HGX-AI-Factory, NVIDIA-NVL72-AI-Factory, NVIDIA-DGX-SuperPOD-B200-RA, NVIDIA-DGX-SuperPOD-GB200-RA, NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA, NVIDIA-Spectrum-6-SPX, NVIDIA-Silicon-Photonics, NVIDIA-Quantum-InfiniBand, NVIDIA-Quantum-X800-InfiniBand, NVIDIA-ConnectX-InfiniBand, NVIDIA-ConnectX-9, NVIDIA-BlueField-DPU, NVIDIA-BlueField-4, NCCL, NVIDIA-DGX Sources: NVIDIA official documentation, https://docs.nvidia.com/networking/software/spectrumx-solution-stack/index.html, https://docs.nvidia.com/doca/sdk/index.html, https://docs.nvidia.com/networking-ethernet-software/cumulus-linux/Whats-New/, https://docs.nvidia.com/networking/display/kubernetes2610/nic-conf-operator/spectrum-x-configuration.html, https://docs.nvidia.com/networking/display/hpcxv226, https://www.nvidia.com/en-us/data-center/ai-data-platform/, https://www.nvidia.com/en-us/data-center/ai-storage/stx/, https://www.nvidia.com/en-us/data-center/ai-storage/cmx/, https://docs.nvidia.com/dgx-superpod/reference-architecture/scalable-infrastructure-b300/latest/index.html, https://developer.nvidia.com/blog/nvidia-vera-rubin-pod-seven-chips-five-rack-scale-systems-one-ai-supercomputer/, https://www.nvidia.com/en-us/networking/silicon-photonics/ Last Updated: 2026-05-09

Summary

NVIDIA Spectrum-X is a networking platform designed to deliver InfiniBand-level AI computing performance over an Ethernet fabric, solving the challenge of running RDMA (Remote Direct Memory Access) collectives over lossy Ethernet infrastructure. Combining the Spectrum-4 400GbE switch ASIC with ConnectX-7 NICs and Adaptive Routing technology, Spectrum-X achieves up to 1.6x higher effective bandwidth for AI workloads compared to standard Ethernet, making it the preferred Ethernet-based AI networking solution for hyperscale cloud and enterprise AI clusters.

Detail

Purpose

Many cloud providers and enterprises have standardized on Ethernet infrastructure but need InfiniBand-class performance for AI training. Spectrum-X bridges this gap by providing a purpose-built Ethernet AI networking system that uses PFC (Priority Flow Control), ECN congestion management, and NVIDIA’s proprietary Adaptive Routing to deliver lossless RDMA for NCCL collectives over standard 400GbE.

Key Features

  • Spectrum-4 switch ASIC: 400GbE, 128 ports, 51.2Tb/s non-blocking bandwidth
  • Adaptive Routing: dynamically balances traffic across equal-cost paths to prevent hotspots
  • RoCEv2 acceleration: hardware-optimized for GPU-to-GPU RDMA over Ethernet
  • NVIDIA-DSX-Air simulation workflows for validating Cumulus Linux and Ethernet fabric designs before deployment
  • Lossless fabric: eliminates packet drops that stall NCCL collectives
  • SHARP over Ethernet: in-network collective offload extending SHARP to Ethernet
  • Co-designed with ConnectX-7/8 NICs for end-to-end Ethernet AI acceleration
  • Compatible with standard 400GbE infrastructure and optics
  • Current NVIDIA docs tie Spectrum-X reference architecture 2.1 to NVIDIA-DOCA 3.3.0, NVIDIA-Cumulus-Linux 5.16, NVIDIA-HPC-X 2.26, and Network Operator Spectrum-X NIC configuration guidance
  • Host-side Spectrum-X Ethernet/RoCE configuration sits on current NVIDIA-DOCA-OFED and legacy NVIDIA-MLNX-EN concepts.
  • DOCA-Flow and DOCA-PCC are adjacent DOCA programming concepts for packet steering and programmable congestion behavior.
  • Current AI data/storage pages tie Spectrum-X to NVIDIA-AI-Data-Platform, NVIDIA-STX, and NVIDIA-CMX as the Ethernet fabric for accelerated enterprise storage and context-memory access
  • Current NVIDIA-Enterprise-Reference-Architectures use Spectrum-X across RTX PRO, HGX, and NVL72 AI factory designs.
  • NVIDIA-Spectrum-X-Validated-Solution-Stack tracks the current validated component versions for GB300, B300, and H200 Spectrum-X deployments.
  • DGX SuperPOD reference architectures use Spectrum-X/Spectrum-4 as storage, in-band, or compute Ethernet fabric depending on the generation and design variant.
  • NVIDIA-Spectrum-6-SPX extends the Spectrum-X direction into Vera Rubin POD networking racks with Spectrum-X Ethernet or Quantum-X800 InfiniBand options.
  • Current silicon photonics material connects Spectrum-X-class Ethernet to optical networking for future AI factory fabrics

Use Cases

  • AI factory Ethernet backbone for LLM training clusters
  • Cloud provider AI/ML training infrastructure (alternative to InfiniBand)
  • Hyperscale data center AI workloads
  • Multi-tenant AI clusters with Ethernet-native management
  • HPC clusters preferring Ethernet over InfiniBand operational models

Hardware Requirements / Compatibility

  • Spectrum-4 (SN5000 series) switches: 400GbE, 51.2Tb/s
  • Spectrum-3 (SN4000 series): 400GbE, previous generation
  • ConnectX-7/ConnectX-8-class adapters depending on platform generation and validated stack target
  • Fully compatible with standard 400GbE transceivers and cables
  • UFM and NVIDIA-NetQ management/observability integration across fabric operations

Language Bindings / APIs

  • NCCL (uses RoCEv2 over Spectrum-X for collective operations)
  • UCX over RoCEv2
  • Standard Linux RDMA/InfiniBand APIs (ibverbs)
  • OpenMPI over RoCEv2

Connections

Resources