GPU Direct Storage

Type: Technology Tags: CUDA, NVIDIA, GPU, Storage, I/O, HPC, AI Training, Direct I/O Related: cuFile-API, nvComp, cuDF, NVIDIA-DALI, cuBLAS, DOCA-SNAP, DOCA-Device-Emulation, NVIDIA-DOCA, NVIDIA-BlueField-DPU, NVIDIA-Certified-Storage, NVIDIA-AI-Data-Platform, NVIDIA-DGX-SuperPOD Sources: NVIDIA official documentation, https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html, https://www.nvidia.com/en-us/data-center/products/certified-storage/ Last Updated: 2026-04-29

Summary

GPU Direct Storage (GDS) is an NVIDIA technology that enables a direct data path between GPU memory and storage (NVMe SSDs, networked storage), bypassing the CPU and system memory bottleneck. This dramatically reduces latency and increases storage I/O bandwidth for GPU-accelerated workloads by allowing data to flow directly from storage to GPU (and vice versa) without CPU-mediated copies.

Detail

Purpose

In traditional GPU workloads, data must travel from storage → CPU memory → GPU memory, creating a bottleneck at the CPU. GDS creates a direct path: storage → GPU memory, eliminating the intermediate CPU copy. This is critical for AI training on large datasets, HPC checkpointing, and real-time data analytics where storage I/O is the bottleneck.

Key Features

Direct data path from NVMe/NFS/GPFS storage to GPU memory
Eliminates CPU as intermediary in storage-to-GPU data transfers
Reduces latency and frees CPU resources for other work
Compatible with NVMe-of (NVMe over Fabrics) and distributed file systems
POSIX-compatible cuFile API for application integration
Integration with RAPIDS cuDF, DALI, and HPC frameworks
Works with GPUDirect RDMA for network-attached storage

Use Cases

AI/ML training from large image, video, or tabular datasets
HPC checkpoint/restart acceleration
Genomics and scientific data processing pipelines
Real-time video analytics
Database query acceleration on GPU
Large model weight loading for inference
AI factory storage validation and partner storage paths where NVIDIA-Certified-Storage and direct GPU data access prevent storage bottlenecks

Hardware Requirements

NVIDIA GPU, Volta (V100) or newer
NVMe SSD with PCIe 3.0 or 4.0 (preferred: PCIe 4.0 for maximum bandwidth)
Linux with compatible NVMe driver
Compatible with network-attached storage via GPUDirect RDMA

Language Bindings

C/C++ (cuFile API)
Python (via RAPIDS and framework wrappers)

Connections

nvComp — nvComp compressed data can flow via GDS directly from storage to GPU for decompression
cuFile-API — cuFile is the direct API reference surface for GPUDirect Storage programming.
cuDF — cuDF supports GDS-backed I/O for reading large datasets directly into GPU DataFrames
NVIDIA-DALI — DALI can use GDS for loading training images directly to GPU without CPU copies
cuBLAS — HPC workflows using cuBLAS benefit from GDS for loading matrix data from disk
DOCA-SNAP — BlueField storage virtualization can present networked storage as local devices adjacent to direct GPU data paths.
DOCA-Device-Emulation — host-facing device emulation is the lower-level mechanism behind some BlueField storage services.
NVIDIA-DOCA — DOCA provides the BlueField software layer for storage, networking, and infrastructure offload.
NVIDIA-BlueField-DPU — BlueField accelerates storage and networking paths that can complement GDS.
NVIDIA-Certified-Storage — certified storage programs validate storage performance needed to keep GPUs fed.
NVIDIA-AI-Data-Platform — AI Data Platform depends on high-throughput data access for extraction, retrieval, and context workflows.
NVIDIA-DGX-SuperPOD — SuperPOD-scale clusters require storage designs that avoid starving accelerators.

AIPS BOOM

Explorer

GPU-Direct-Storage

GPU Direct Storage

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks