cuda.compute
Type: Technology Tags: NVIDIA, CUDA, Python, CCCL, Parallel Algorithms, CUB, Thrust, GPU Programming Related: CUDA-Python, CUDA-Cpp-Standard-Library, Thrust, CUB, cuda-coop, NVRTC, nvJitLink, CuPy, PyTorch Sources: https://nvidia.github.io/cccl/python/compute, https://nvidia.github.io/cuda-python/latest/ Last Updated: 2026-04-29
Summary
cuda.compute is NVIDIA’s current CUDA Python / CCCL module for host-callable GPU parallel algorithms such as reductions, scans, sorts, transforms, and related composable primitives. It supersedes the older cuda.parallel naming used in earlier CCCL Python material while keeping the same broad goal: expose high-performance CUB/Thrust-style algorithms to Python users without requiring them to write CUDA C++ kernels directly.
Detail
Purpose
cuda.compute gives Python applications access to CUDA Core Compute Libraries algorithms over device arrays and iterators. It is useful when a workflow needs GPU reductions, scans, sorting, transforms, or operation fusion from Python but does not need a custom handwritten kernel for every operation.
Key capabilities
- Host-callable parallel algorithms over arrays and iterators.
- Reductions, scans, sorts, transforms, and other building blocks derived from CCCL/CUB algorithm families.
- User-defined operations compiled into device code with Numba CUDA semantics.
- Interop-oriented naming conventions where
d_arguments indicate device memory andh_arguments indicate host memory. - Device array interoperability with Python GPU ecosystems such as CuPy and PyTorch when compatible buffers are supplied.
NVIDIA context
cuda.compute belongs with Thrust, CUB, and CUDA-Cpp-Standard-Library because it is the Python-facing path into CCCL algorithmic building blocks. It is distinct from cuda-coop, which provides block-wide and warp-wide cooperative primitives for use inside Numba CUDA kernels.
Connections
- CUDA-Python - CUDA Python umbrella that lists
cuda.computeas a current component. - CUDA-Cpp-Standard-Library - CCCL family context for CUDA C++ and Python library components.
- Thrust - C++ parallel algorithms that share the CCCL family.
- CUB - lower-level CUB primitives behind many CCCL algorithm patterns.
- cuda-coop - complementary CCCL Python module for device-side cooperative algorithms inside kernels.
- NVRTC and nvJitLink - relevant to runtime generation/linking patterns in Python CUDA algorithm stacks.
- CuPy and PyTorch - common Python GPU array ecosystems that can supply device memory.
Source Excerpts
- NVIDIA describes
cuda.computeas a library of composable GPU parallel primitives that can build reductions, scans, sorts, transforms, and more. - The current CUDA Python hub lists
cuda.computeas the Python module for CCCL’s customizable parallel algorithms callable on the host.