Thrust
Type: Technology Tags: CUDA, NVIDIA, GPU, Parallel Algorithms, C++, STL, Open Source, HPC Related: CUB, cuBLAS, cuDF, CUTLASS, cuRAND Sources: NVIDIA official documentation Last Updated: 2026-04-09
Summary
Thrust is NVIDIA’s open-source C++ parallel algorithms and data structures library providing STL-compatible templated interfaces for GPU computing. It abstracts away low-level GPU memory management and kernel launch details, enabling developers to write high-performance GPU-accelerated sort, scan, reduce, and transform operations using familiar C++ STL idioms. Thrust’s sort is 5–100x faster than STL and TBB equivalents.
Detail
Purpose
Writing efficient CUDA kernels requires deep knowledge of GPU architecture (threads, warps, shared memory). Thrust allows C++ developers to use familiar STL-style algorithms (sort, reduce, scan, transform) that automatically run on the GPU, dramatically increasing productivity while achieving high performance. It serves as the GPU-accelerated counterpart to the C++ Standard Template Library.
Key Features
- STL-compatible C++ template interfaces for heterogeneous parallel computing
- Core algorithms: sort, scan (prefix sum), transform, reduce, gather, scatter, copy
- 5–100x faster sorting than STL and Intel TBB
- Transparent host/device execution — works on both CPU (with TBB/OpenMP backend) and GPU
- Automatic memory management for device vectors
- Interoperability with raw CUDA kernels and other CUDA libraries
- Open source on GitHub
- Production-tested version included in the CUDA Toolkit
- CUDA Fortran bindings for Fortran interoperability
Use Cases
- Financial computing (cashflow generation, Libor market models, variable annuities)
- Any application requiring GPU-accelerated sorting, scanning, or reduction
- Accelerating existing C++ STL-based algorithms on GPU
- Building higher-level GPU libraries (cuDF, RAPIDS use Thrust internally)
- Scientific computing and HPC data processing
- Machine learning data preprocessing
Hardware Requirements
- NVIDIA GPU with CUDA support (all modern GPUs)
- CUDA Toolkit installation
- C++14 or later compiler
- CPU backends: TBB or OpenMP for non-GPU execution
Language Bindings
- C++ (primary — template library)
- CUDA Fortran (Fortran interoperability)
Connections
- CUB — CUB provides lower-level warp/block/device primitives that Thrust builds upon
- cuBLAS — Thrust complements cuBLAS: Thrust for general parallel ops, cuBLAS for dense BLAS
- cuDF — RAPIDS cuDF is built on Thrust for its parallel data processing operations
- CUTLASS — both are C++ template libraries; CUTLASS for GEMM, Thrust for general parallel algorithms
- cuRAND — Thrust algorithms commonly operate on data generated by cuRAND