nvmath-python

Type: Technology Tags: CUDA, NVIDIA, GPU, Python, Math, Linear Algebra, FFT, Random Numbers, Open Source Related: CUDA-Python, cuda-core, cuda-bindings, cuBLAS, cuFFT, cuFFTMp, cuRAND, cuDSS, cuTENSOR Sources: NVIDIA official documentation, https://nvidia.github.io/cuda-python/latest/, https://docs.nvidia.com/cuda/nvmath-python/ Last Updated: 2026-04-29

Summary

nvmath-python is an open-source Python library that bridges the Python scientific ecosystem to NVIDIA’s CUDA-X math libraries (cuBLAS, cuFFT, cuRAND, cuDSS, and more) through intuitive, Pythonic APIs. It supports host and device API modes, multi-GPU/multi-node scaling, and deep interoperability with NumPy, CuPy, PyTorch, RAPIDS, SciPy, and scikit-learn — delivering native CUDA-X performance without leaving Python.

Detail

Purpose

The Python scientific community relies on libraries like NumPy and SciPy, but these don’t expose the full performance of NVIDIA’s CUDA-X math libraries. nvmath-python closes this gap by providing first-class Python interfaces to GPU-native math primitives, enabling scientists and ML practitioners to scale workflows with minimal code changes and no C++ required. The current CUDA-Python hub lists nvmath-python as the Pythonic access layer for NVIDIA CPU and GPU math libraries.

Key Features

  • Intuitive Pythonic APIs with customization options for advanced users
  • Host APIs (generic and specialized) for standard GPU-offloaded computation
  • Device APIs for embedding math operations inside custom CUDA kernels
  • Callback support for fusing custom Python code with library operations
  • Stateful class-form APIs with distinct specification, planning, autotuning, and execution phases
  • Python logging integration for observability
  • Multi-GPU and multi-node scaling without extensive recoding
  • Interoperability: works alongside NumPy, CuPy, PyTorch, RAPIDS, SciPy, scikit-learn
  • CPU fallback: NVIDIA Performance Libraries (NVPL) for Grace CPU; MKL for x86 hosts

Supported Operations

  • Dense Linear Algebra (GEMM): wraps cuBLAS and cuBLASDx
  • Fast Fourier Transforms: wraps cuFFT, cuFFTDx, cuFFTMp (C2C, C2R, R2C)
  • Random Number Generation: wraps cuRAND (pseudo and quasi-random + distributions)
  • Sparse Linear Algebra: wraps cuDSS for direct sparse linear system solving

Use Cases

  • Scientific computing in Python requiring GPU acceleration
  • ML research requiring custom math operations beyond framework capabilities
  • Library development on top of CUDA-X primitives
  • Multi-GPU and multi-node numerical workflows from Python

Hardware Requirements

  • NVIDIA GPU with CUDA support for GPU execution
  • NVIDIA Grace CPU for NVPL CPU fallback
  • x86 CPU with MKL for CPU fallback
  • Numba-CUDA for custom kernel integration

Language Bindings

  • Python (primary — this is a Python library)
  • Integrates with Numba-CUDA for device kernel embedding

Connections

  • cuBLAS — nvmath-python wraps cuBLAS GEMM operations
  • CUDA-Python — umbrella that lists nvmath-python as a Python CUDA component.
  • cuda-core and cuda-bindings — adjacent CUDA Python layers for runtime/core access and low-level bindings.
  • cuFFT — nvmath-python wraps cuFFT transform operations
  • cuFFTMp - distributed FFT support documented as part of nvmath-python FFT coverage.
  • cuRAND — nvmath-python wraps cuRAND random number generation
  • cuDSS — nvmath-python wraps cuDSS sparse direct solver
  • cuTENSOR — tensor operations available via nvmath-python interfaces

Resources