cuStateVec
Type: Technology Tags: CUDA, NVIDIA, GPU, Quantum Computing, Simulation, State Vector, cuQuantum Related: cuQuantum, cuTensorNet, cuDensityMat, cuPauliProp, cuStabilizer, cuQuantum-Appliance, CUDA-Q, CUDA-QX, NVIDIA-Quantum, cuBLAS, NCCL Sources: https://docs.nvidia.com/cuda/cuquantum/latest/custatevec/index.html, https://docs.nvidia.com/cuda/cuquantum/latest/ Last Updated: 2026-04-29
Summary
cuStateVec is the state-vector simulation component of cuQuantum. It provides GPU-accelerated primitives for simulating quantum circuits via explicit representation of the 2^N-dimensional complex amplitude array, with optimized memory management, gate application kernels, measurement, sampling, and expectation-value evaluation. It integrates into major quantum frameworks such as Qiskit, Cirq, and PennyLane to provide GPU acceleration, and it is packaged with the other current cuQuantum libraries inside cuQuantum-Appliance.
Detail
Purpose
State-vector simulation is the canonical approach to classical simulation of quantum circuits: maintain the full 2^N complex amplitude array and apply gates as unitary transformations. Memory grows exponentially with qubit count — a single H100 80GB GPU can hold the full state for roughly 33 qubits in double precision; multi-GPU configurations push that to ~40 qubits. cuStateVec optimizes the inner loop of this simulation for NVIDIA GPUs, delivering the largest wins over CPU for dense and moderately-deep circuits.
Key Features
- Optimized single-GPU and multi-GPU state-vector manipulation
- Gate application kernels for one-, two-, and multi-qubit gates (dense, diagonal, permutation, Pauli)
- Measurement and sampling (Z-basis, expectation values, observable evaluation)
- Controlled-gate abstractions and batched gate application
- Accessor API for scatter/gather of state amplitudes
- Distributed state-vector support across multiple GPUs via NCCL
- C++ and Python APIs
Use Cases
- Executing NISQ-era quantum circuits on classical hardware for research and validation
- Quantum hardware emulation and noise modeling
- Variational quantum algorithms (VQE, QAOA) on up to ~40 qubits
- Teaching and algorithm prototyping
- Qiskit-Aer GPU backend, Cirq and PennyLane state-vector backends
Hardware Requirements
- NVIDIA GPU (H100, H200, B200, GB200 NVL72 recommended for large-scale simulations)
- Multi-GPU and multi-node clusters for >33 qubits
- CUDA 11.x or 12.x
Language Bindings
- Python (via the cuQuantum Python package)
- C++ (low-level single/multi-GPU API)
Connections
- cuQuantum — cuStateVec is the state-vector component of the cuQuantum SDK
- cuTensorNet - sibling tensor-network simulation component for contractions, slicing, MPS workflows, and distributed tensor-network execution.
- cuDensityMat, cuPauliProp, and cuStabilizer - sibling current cuQuantum libraries for analog dynamics, Pauli propagation, and stabilizer simulation.
- cuQuantum-Appliance - containerized workflow that includes cuStateVec.
- CUDA-Q - higher-level quantum programming platform that can use cuQuantum simulation backends.
- NVIDIA-Quantum - platform context for accelerated quantum computing.
- cuBLAS — cuStateVec uses cuBLAS for dense matrix operations in gate simulation
- NCCL — multi-GPU cuStateVec uses NCCL for distributed state-vector operations