cuDF

Type: Technology Tags: CUDA, NVIDIA, GPU, DataFrame, Data Processing, RAPIDS, Python, Open Source Related: NVIDIA-RAPIDS, RAPIDS-Accelerator-for-Apache-Spark, cuML, cuGraph, cuVS, cuCIM, Dask, Thrust Sources: NVIDIA official documentation (RAPIDS), https://docs.nvidia.com/rapids/index.html, https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries, https://docs.rapids.ai/api/cudf/ Last Updated: 2026-04-30

Summary

cuDF is a GPU-accelerated DataFrame library that provides a drop-in replacement for pandas, delivering up to 50x faster data processing on NVIDIA GPUs. Part of NVIDIA-RAPIDS, it offers zero-code-change accelerators for pandas, Polars, and Apache Spark DataFrames, enabling data scientists to accelerate ETL, feature engineering, and data analysis pipelines without rewriting code.

Detail

Purpose

Pandas DataFrames are the standard tool for tabular data manipulation in Python, but they run on CPU and become bottlenecks at scale. cuDF moves DataFrame operations (joins, groupby, sort, filter, string ops) to the GPU, enabling the same pandas-style code to run dramatically faster on large datasets without code changes.

Key Features

GPU-accelerated DataFrame operations (50x faster than pandas)
Drop-in accelerator for pandas — zero code change required
Drop-in accelerator for Polars and Apache Spark DataFrames
cuDF.pandas mode: transparently accelerates existing pandas code
Full support for standard DataFrame operations: merge, groupby, sort, filter, aggregation
String and datetime column support
Integration with Arrow columnar format for zero-copy data exchange
Dask-cuDF for distributed multi-GPU DataFrame processing
Python and C++ APIs

Use Cases

Large-scale ETL (Extract, Transform, Load) pipelines
Feature engineering for machine learning
Log analysis and analytics at scale
Data cleaning and preparation for AI pipelines
Financial data processing
Accelerating Apache Spark workloads on GPU clusters

Hardware Requirements

NVIDIA GPU, Pascal (GP100) or newer (Volta+ recommended)
CUDA 11.x or 12.x
Linux (primary supported OS)
Part of RAPIDS ecosystem

Language Bindings

Python (primary API, pandas-compatible)
C++ (libcudf — underlying library)

Connections

NVIDIA-RAPIDS — cuDF is the central DataFrame library in NVIDIA’s CUDA-X data science stack
RAPIDS-Accelerator-for-Apache-Spark — Spark RAPIDS uses cuDF as the GPU execution engine for supported Spark SQL/DataFrame operations
cuML — cuML uses cuDF DataFrames as its input/output format
cuGraph — cuGraph operates on cuDF edge and node DataFrames
cuVS — cuVS integrates with cuDF for vector search pipelines
Dask — dask-cuDF scales cuDF DataFrames across multi-GPU and multi-node workers
Thrust — cuDF internals use Thrust for parallel primitives
cuCIM — cuCIM and cuDF both part of RAPIDS for GPU-accelerated data science

AIPS BOOM

Explorer

cuDF

cuDF

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks