NVIDIA DALI

Type: Technology Tags: CUDA, NVIDIA, GPU, Data Loading, Image Processing, Deep Learning, Open Source, Training Related: cuDNN, TensorRT, nvComp, Megatron-Energon, CV-CUDA, nvImageCodec Sources: NVIDIA official documentation Last Updated: 2026-04-09

Summary

NVIDIA DALI (Data Loading Library) is an open-source, portable library for GPU-accelerated decoding and augmentation of images, videos, and audio, designed to eliminate the data preprocessing bottleneck in deep learning training pipelines. As a drop-in replacement for built-in data loaders in PyTorch, TensorFlow, and MXNet, DALI overlaps data preprocessing with model training to maximize GPU utilization.

Detail

Purpose

Deep learning training pipelines often stall waiting for preprocessed batches from CPU-bound data loaders (decode, crop, resize, normalize). DALI moves these preprocessing operations to the GPU, overlapping data preparation with model computation so the GPU never sits idle waiting for data.

Key Features

GPU-accelerated image, video, 3D volume, and audio preprocessing
Drop-in replacement for PyTorch DataLoader, TensorFlow data pipeline, and MXNet data iterator
Python APIs for rapid pipeline construction with transparent multi-GPU scaling
Flexible graph-based pipeline architecture supporting custom operators
Extensive format support: JPEG, PNG, TIFF, JPEG2K, BMP, WebP, LMDB, TFRecord, COCO, WAV, FLAC, OGG, H.264, HEVC
Optical flow computation via NVIDIA Optical Flow SDK integration
Super-resolution and video-specific operations
Compatible with TensorRT, Triton Inference Server, TAO Toolkit, NVDEC

Use Cases

Accelerating image classification, detection, and segmentation training
Video model training (action recognition, object tracking)
Audio model training (speech recognition, sound classification)
Medical imaging data pipelines (DICOM, TIFF volumes)
Multi-modal training pipelines combining image, video, and text

Hardware Requirements

NVIDIA GPU with CUDA support
NVDEC-capable GPU for hardware video decode (Volta+ preferred)
Optical flow requires Turing, Ampere, or Ada architecture
Available on Linux (primary) and Windows

Language Bindings

Python (primary API)
C++ (underlying pipeline implementation)

Connections

CV-CUDA — CV-CUDA handles post-processing for inference; DALI handles pre-processing for training
nvComp — DALI can decompress compressed image/video data via nvComp integration
Megatron-Energon - complementary multimodal data loader for Megatron-scale WebDataset/JSONL data pipelines.
nvImageCodec — nvImageCodec provides codec plugins that DALI uses for image decode
TensorRT — DALI preprocessed data feeds into TensorRT inference pipelines
cuDNN — cuDNN consumes DALI-preprocessed tensors for model execution

AIPS BOOM

Explorer

NVIDIA-DALI

NVIDIA DALI

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks