cuML
Type: Technology Tags: CUDA, NVIDIA, GPU, Machine Learning, RAPIDS, scikit-learn, Python, Open Source Related: NVIDIA-RAPIDS, cuDF, cuGraph, cuVS, Dask, NVIDIA-Merlin, cuDNN, Thrust Sources: NVIDIA official documentation (RAPIDS), https://docs.nvidia.com/rapids/index.html, https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries, https://docs.rapids.ai/api/cuml/ Last Updated: 2026-04-30
Summary
cuML is a GPU-accelerated machine learning library providing drop-in replacements for scikit-learn, UMAP, and HDBSCAN algorithms, delivering up to 50x speedups on NVIDIA GPUs. Part of NVIDIA-RAPIDS, it enables data scientists to run classical ML algorithms — clustering, regression, dimensionality reduction, and more — on GPU without rewriting their scikit-learn code.
Detail
Purpose
Scikit-learn is the standard Python ML library for classical algorithms, but it is CPU-bound and slow on large datasets. cuML accelerates these same algorithms on the GPU, enabling orders-of-magnitude speedups for training and prediction, making it practical to run ML at scale in data pipelines that previously required distributed CPU clusters.
Key Features
- 50x faster scikit-learn with zero-code-change accelerator
- GPU-accelerated algorithms: regression (linear, ridge, lasso), classification (SVM, random forest, KNN), clustering (K-Means, DBSCAN, HDBSCAN), dimensionality reduction (PCA, UMAP, t-SNE)
- Drop-in replacement for UMAP and HDBSCAN specifically
- cuML.accel: transparent scikit-learn acceleration mode
- cuDF integration — operates natively on GPU DataFrames
- Dask-cuML for multi-GPU and distributed ML
- Python and C++ APIs
Use Cases
- Large-scale classical ML training (clustering, regression, classification)
- Dimensionality reduction for visualization (UMAP, t-SNE on millions of points)
- Anomaly detection at scale
- Feature engineering and preprocessing in GPU pipelines
- Accelerating AutoML and hyperparameter search
- NLP feature extraction (TF-IDF at GPU speed)
Hardware Requirements
- NVIDIA GPU, Pascal or newer (Volta+ recommended)
- CUDA 11.x or 12.x
- Linux (primary supported OS)
- Part of RAPIDS ecosystem
Language Bindings
- Python (primary API, scikit-learn compatible)
- C++ (underlying libml implementation)
Connections
- NVIDIA-RAPIDS — cuML is the classical machine learning library in NVIDIA’s CUDA-X data science stack
- cuDF — cuML takes cuDF DataFrames as input/output for seamless GPU pipeline integration
- cuGraph — cuML and cuGraph share graph-based clustering algorithms
- cuVS — cuVS provides GPU-accelerated nearest neighbor search used by cuML KNN
- Dask — dask-cuML scales cuML estimators across distributed GPU workers
- NVIDIA-Merlin — recommender workflows can pair RAPIDS preprocessing and ML with Merlin-specific recommendation components
- cuDNN — cuML complements cuDNN (which targets deep learning); cuML handles classical ML
- Thrust — cuML uses Thrust for underlying parallel primitives