nvmath-python

Type: Technology Tags: CUDA, NVIDIA, GPU, Python, Math, Linear Algebra, FFT, Random Numbers, Open Source Related: CUDA-Python, cuda-core, cuda-bindings, cuBLAS, cuFFT, cuFFTMp, cuRAND, cuDSS, cuTENSOR Sources: NVIDIA official documentation, https://nvidia.github.io/cuda-python/latest/, https://docs.nvidia.com/cuda/nvmath-python/ Last Updated: 2026-04-29

Summary

nvmath-python is an open-source Python library that bridges the Python scientific ecosystem to NVIDIA’s CUDA-X math libraries (cuBLAS, cuFFT, cuRAND, cuDSS, and more) through intuitive, Pythonic APIs. It supports host and device API modes, multi-GPU/multi-node scaling, and deep interoperability with NumPy, CuPy, PyTorch, RAPIDS, SciPy, and scikit-learn — delivering native CUDA-X performance without leaving Python.

Detail

Purpose

The Python scientific community relies on libraries like NumPy and SciPy, but these don’t expose the full performance of NVIDIA’s CUDA-X math libraries. nvmath-python closes this gap by providing first-class Python interfaces to GPU-native math primitives, enabling scientists and ML practitioners to scale workflows with minimal code changes and no C++ required. The current CUDA-Python hub lists nvmath-python as the Pythonic access layer for NVIDIA CPU and GPU math libraries.

Key Features

Intuitive Pythonic APIs with customization options for advanced users
Host APIs (generic and specialized) for standard GPU-offloaded computation
Device APIs for embedding math operations inside custom CUDA kernels
Callback support for fusing custom Python code with library operations
Stateful class-form APIs with distinct specification, planning, autotuning, and execution phases
Python logging integration for observability
Multi-GPU and multi-node scaling without extensive recoding
Interoperability: works alongside NumPy, CuPy, PyTorch, RAPIDS, SciPy, scikit-learn
CPU fallback: NVIDIA Performance Libraries (NVPL) for Grace CPU; MKL for x86 hosts

Supported Operations

Dense Linear Algebra (GEMM): wraps cuBLAS and cuBLASDx
Fast Fourier Transforms: wraps cuFFT, cuFFTDx, cuFFTMp (C2C, C2R, R2C)
Random Number Generation: wraps cuRAND (pseudo and quasi-random + distributions)
Sparse Linear Algebra: wraps cuDSS for direct sparse linear system solving

Use Cases

Scientific computing in Python requiring GPU acceleration
ML research requiring custom math operations beyond framework capabilities
Library development on top of CUDA-X primitives
Multi-GPU and multi-node numerical workflows from Python

Hardware Requirements

NVIDIA GPU with CUDA support for GPU execution
NVIDIA Grace CPU for NVPL CPU fallback
x86 CPU with MKL for CPU fallback
Numba-CUDA for custom kernel integration

Language Bindings

Python (primary — this is a Python library)
Integrates with Numba-CUDA for device kernel embedding

Connections

cuBLAS — nvmath-python wraps cuBLAS GEMM operations
CUDA-Python — umbrella that lists nvmath-python as a Python CUDA component.
cuda-core and cuda-bindings — adjacent CUDA Python layers for runtime/core access and low-level bindings.
cuFFT — nvmath-python wraps cuFFT transform operations
cuFFTMp - distributed FFT support documented as part of nvmath-python FFT coverage.
cuRAND — nvmath-python wraps cuRAND random number generation
cuDSS — nvmath-python wraps cuDSS sparse direct solver
cuTENSOR — tensor operations available via nvmath-python interfaces

AIPS BOOM

Explorer

nvmath-python

nvmath-python

Summary

Detail

Purpose

Key Features

Supported Operations

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks