Thrust

Type: Technology Tags: CUDA, NVIDIA, GPU, Parallel Algorithms, C++, STL, Open Source, HPC Related: CUB, cuBLAS, cuDF, CUTLASS, cuRAND Sources: NVIDIA official documentation Last Updated: 2026-04-09

Summary

Thrust is NVIDIA’s open-source C++ parallel algorithms and data structures library providing STL-compatible templated interfaces for GPU computing. It abstracts away low-level GPU memory management and kernel launch details, enabling developers to write high-performance GPU-accelerated sort, scan, reduce, and transform operations using familiar C++ STL idioms. Thrust’s sort is 5–100x faster than STL and TBB equivalents.

Detail

Purpose

Writing efficient CUDA kernels requires deep knowledge of GPU architecture (threads, warps, shared memory). Thrust allows C++ developers to use familiar STL-style algorithms (sort, reduce, scan, transform) that automatically run on the GPU, dramatically increasing productivity while achieving high performance. It serves as the GPU-accelerated counterpart to the C++ Standard Template Library.

Key Features

  • STL-compatible C++ template interfaces for heterogeneous parallel computing
  • Core algorithms: sort, scan (prefix sum), transform, reduce, gather, scatter, copy
  • 5–100x faster sorting than STL and Intel TBB
  • Transparent host/device execution — works on both CPU (with TBB/OpenMP backend) and GPU
  • Automatic memory management for device vectors
  • Interoperability with raw CUDA kernels and other CUDA libraries
  • Open source on GitHub
  • Production-tested version included in the CUDA Toolkit
  • CUDA Fortran bindings for Fortran interoperability

Use Cases

  • Financial computing (cashflow generation, Libor market models, variable annuities)
  • Any application requiring GPU-accelerated sorting, scanning, or reduction
  • Accelerating existing C++ STL-based algorithms on GPU
  • Building higher-level GPU libraries (cuDF, RAPIDS use Thrust internally)
  • Scientific computing and HPC data processing
  • Machine learning data preprocessing

Hardware Requirements

  • NVIDIA GPU with CUDA support (all modern GPUs)
  • CUDA Toolkit installation
  • C++14 or later compiler
  • CPU backends: TBB or OpenMP for non-GPU execution

Language Bindings

  • C++ (primary — template library)
  • CUDA Fortran (Fortran interoperability)

Connections

  • CUB — CUB provides lower-level warp/block/device primitives that Thrust builds upon
  • cuBLAS — Thrust complements cuBLAS: Thrust for general parallel ops, cuBLAS for dense BLAS
  • cuDF — RAPIDS cuDF is built on Thrust for its parallel data processing operations
  • CUTLASS — both are C++ template libraries; CUTLASS for GEMM, Thrust for general parallel algorithms
  • cuRAND — Thrust algorithms commonly operate on data generated by cuRAND

Resources