GPU Direct Storage

Type: Technology Tags: CUDA, NVIDIA, GPU, Storage, I/O, HPC, AI Training, Direct I/O Related: cuFile-API, nvComp, cuDF, NVIDIA-DALI, cuBLAS, DOCA-SNAP, DOCA-Device-Emulation, NVIDIA-DOCA, NVIDIA-BlueField-DPU, NVIDIA-Certified-Storage, NVIDIA-AI-Data-Platform, NVIDIA-DGX-SuperPOD Sources: NVIDIA official documentation, https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html, https://www.nvidia.com/en-us/data-center/products/certified-storage/ Last Updated: 2026-04-29

Summary

GPU Direct Storage (GDS) is an NVIDIA technology that enables a direct data path between GPU memory and storage (NVMe SSDs, networked storage), bypassing the CPU and system memory bottleneck. This dramatically reduces latency and increases storage I/O bandwidth for GPU-accelerated workloads by allowing data to flow directly from storage to GPU (and vice versa) without CPU-mediated copies.

Detail

Purpose

In traditional GPU workloads, data must travel from storage → CPU memory → GPU memory, creating a bottleneck at the CPU. GDS creates a direct path: storage → GPU memory, eliminating the intermediate CPU copy. This is critical for AI training on large datasets, HPC checkpointing, and real-time data analytics where storage I/O is the bottleneck.

Key Features

  • Direct data path from NVMe/NFS/GPFS storage to GPU memory
  • Eliminates CPU as intermediary in storage-to-GPU data transfers
  • Reduces latency and frees CPU resources for other work
  • Compatible with NVMe-of (NVMe over Fabrics) and distributed file systems
  • POSIX-compatible cuFile API for application integration
  • Integration with RAPIDS cuDF, DALI, and HPC frameworks
  • Works with GPUDirect RDMA for network-attached storage

Use Cases

  • AI/ML training from large image, video, or tabular datasets
  • HPC checkpoint/restart acceleration
  • Genomics and scientific data processing pipelines
  • Real-time video analytics
  • Database query acceleration on GPU
  • Large model weight loading for inference
  • AI factory storage validation and partner storage paths where NVIDIA-Certified-Storage and direct GPU data access prevent storage bottlenecks

Hardware Requirements

  • NVIDIA GPU, Volta (V100) or newer
  • NVMe SSD with PCIe 3.0 or 4.0 (preferred: PCIe 4.0 for maximum bandwidth)
  • Linux with compatible NVMe driver
  • Compatible with network-attached storage via GPUDirect RDMA

Language Bindings

  • C/C++ (cuFile API)
  • Python (via RAPIDS and framework wrappers)

Connections

  • nvComp — nvComp compressed data can flow via GDS directly from storage to GPU for decompression
  • cuFile-API — cuFile is the direct API reference surface for GPUDirect Storage programming.
  • cuDF — cuDF supports GDS-backed I/O for reading large datasets directly into GPU DataFrames
  • NVIDIA-DALI — DALI can use GDS for loading training images directly to GPU without CPU copies
  • cuBLAS — HPC workflows using cuBLAS benefit from GDS for loading matrix data from disk
  • DOCA-SNAP — BlueField storage virtualization can present networked storage as local devices adjacent to direct GPU data paths.
  • DOCA-Device-Emulation — host-facing device emulation is the lower-level mechanism behind some BlueField storage services.
  • NVIDIA-DOCA — DOCA provides the BlueField software layer for storage, networking, and infrastructure offload.
  • NVIDIA-BlueField-DPU — BlueField accelerates storage and networking paths that can complement GDS.
  • NVIDIA-Certified-Storage — certified storage programs validate storage performance needed to keep GPUs fed.
  • NVIDIA-AI-Data-Platform — AI Data Platform depends on high-throughput data access for extraction, retrieval, and context workflows.
  • NVIDIA-DGX-SuperPOD — SuperPOD-scale clusters require storage designs that avoid starving accelerators.

Resources