cuSPARSE

Type: Technology Tags: CUDA, NVIDIA, GPU, Sparse Matrix, Linear Algebra, Math, HPC, AI Related: cuBLAS, cuSOLVER, cuSPARSELt, AmgX, Incomplete-LU-Cholesky, cuML, CUTLASS, NVIDIA-CUDA Sources: NVIDIA official documentation Last Updated: 2026-04-29

Summary

cuSPARSE is NVIDIA’s GPU-accelerated library providing linear algebra routines optimized for sparse matrices, delivering 30–150x speedups over CPU-only alternatives for SpMM operations. It supports multiple sparse formats (COO, CSR, CSC, Blocked CSR) and covers sparse vector, matrix-vector, and matrix-matrix operations. The companion cuSPARSELt library extends this with 2:4 structured sparsity acceleration via Sparse Tensor Cores on Ampere and later GPUs.

Detail

Purpose

Many real-world problems in machine learning, physics simulation, and graph analytics produce matrices that are mostly zero (sparse). cuSPARSE enables efficient GPU computation on these matrices using compressed storage formats, avoiding wasted computation on zero entries and delivering massive speedups over dense approaches.

Key Features

Sparse matrix formats: dense, COO, CSR, CSC, Blocked CSR
Sparse vector–dense vector multiplication (SpVV)
Sparse matrix–dense vector multiplication (SpMV)
Sparse matrix–dense matrix multiplication (SpMM) — 30–150x faster than CPU
Sparse matrix addition and multiplication
ILU0 and IC0 preconditioner support
Incomplete-LU-Cholesky guidance for GPU-accelerated preconditioned iterative methods
Generic high-performance APIs for flexible data types
cuSPARSELt: 2:4 structured sparsity via Sparse Tensor Cores (Ampere+), including pruning, compression, and activation function integration

Use Cases

Machine learning with sparse weight matrices
Computational fluid dynamics (CFD) with sparse system matrices
Seismic exploration and geophysical simulation
Graph analytics and network analysis
Iterative linear system solvers (as preconditioner)
Finite element and finite difference methods

Hardware Requirements

NVIDIA GPU with CUDA support
cuSPARSELt requires Ampere (A100) or later (Sparse Tensor Core support)
SpMM acceleration benefits from high-bandwidth memory (HBM) GPUs
Included in NVIDIA HPC SDK and CUDA Toolkit

Language Bindings

C and C++ (primary API)
Python (via CuPy, scipy.sparse GPU backends)

Connections

cuBLAS — dense counterpart; cuSPARSE for sparse, cuBLAS for dense
cuSOLVER — cuSOLVER’s sparse module (cusolverSP) builds on cuSPARSE
cuSPARSELt — structured sparse matrix-matrix multiplication library for supported sparsity patterns
Incomplete-LU-Cholesky — guide for using cuSPARSE and cuBLAS in preconditioned iterative solvers
AmgX — AmgX uses cuSPARSE internally for sparse matrix operations
cuML — sparse ML kernels in cuML leverage cuSPARSE
CUTLASS — CUTLASS provides structured sparsity GEMM kernels complementing cuSPARSELt

AIPS BOOM

Explorer

cuSPARSE

cuSPARSE

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks