cuTENSORMg

Type: Library Tags: NVIDIA, CUDA, cuTENSOR, tensor contractions, multi-GPU, CUDA Graphs, HPC, quantum chemistry Related: cuTENSOR, cuTENSORMp, cuTensorNet, cuQuantum, NVIDIA-HPC-SDK, NVIDIA-CUDA, NVIDIA-HPC-Compilers, NVSHMEM, NCCL, GPUDirect-RDMA Sources: https://docs.nvidia.com/cuda/cutensor/latest/user_guide_cutensorMg.html, https://docs.nvidia.com/cuda/cutensor/latest/index.html Last Updated: 2026-04-29

Summary

cuTENSORMg is the multi-GPU support layer in cuTENSOR for tensor operations across multiple GPUs in a single process. Current NVIDIA docs describe cuTENSORMg performance guidelines, accuracy behavior, scalar-type rules, CUDA Graph capture support, and logging controls for multi-GPU tensor contractions.

Detail

Purpose

Tensor contraction workloads in machine learning, quantum chemistry, and physics can outgrow a single GPU. cuTENSORMg extends cuTENSOR-style tensor contractions to multiple GPUs while keeping the programming model close to the base cuTENSOR API.

Current scope

Multi-GPU tensor contraction support for cuTENSOR workloads.
Performance guidance for distributing tensors across GPUs, choosing block sizes, preserving stride-1 locality, and distributing across many modes.
Accuracy guidance aligned with cuTENSOR, with half-precision caveats.
Scalar type rules inherited from cuTENSOR.
CUDA Graph capture support when operations do not involve host operands.
Logging through CUTENSORMG_LOG_LEVEL, CUTENSORMG_LOG_MASK, and CUTENSORMG_LOG_FILE.

NVIDIA context

cuTENSORMg is the single-process, multi-GPU tensor path. cuTENSORMp is the multi-process distributed tensor path. Both sit under cuTENSOR and are adjacent to cuTensorNet and cuQuantum workloads that rely on efficient tensor contractions.

Connections

cuTENSOR - parent CUDA tensor primitive library.
cuTENSORMp - multi-process distributed tensor contraction sibling.
cuTensorNet - quantum tensor-network library that depends on cuTENSOR-style tensor contractions.
cuQuantum - quantum simulation SDK where tensor contractions matter.
NVIDIA-HPC-SDK and NVIDIA-HPC-Compilers - HPC development context for large tensor applications.
NVIDIA-CUDA - core CUDA platform underneath cuTENSOR.
NCCL, NVSHMEM, and GPUDirect-RDMA - adjacent multi-GPU communication/fabric context.

AIPS BOOM

Explorer

cuTENSORMg

cuTENSORMg

Summary

Detail

Purpose

Current scope

NVIDIA context

Connections

Resources

Graph View

Table of Contents

Backlinks