Thrust

Type: Technology Tags: CUDA, NVIDIA, GPU, Parallel Algorithms, C++, STL, Open Source, HPC Related: CUB, cuBLAS, cuDF, CUTLASS, cuRAND Sources: NVIDIA official documentation Last Updated: 2026-04-09

Summary

Thrust is NVIDIA’s open-source C++ parallel algorithms and data structures library providing STL-compatible templated interfaces for GPU computing. It abstracts away low-level GPU memory management and kernel launch details, enabling developers to write high-performance GPU-accelerated sort, scan, reduce, and transform operations using familiar C++ STL idioms. Thrust’s sort is 5–100x faster than STL and TBB equivalents.

Detail

Purpose

Writing efficient CUDA kernels requires deep knowledge of GPU architecture (threads, warps, shared memory). Thrust allows C++ developers to use familiar STL-style algorithms (sort, reduce, scan, transform) that automatically run on the GPU, dramatically increasing productivity while achieving high performance. It serves as the GPU-accelerated counterpart to the C++ Standard Template Library.

Key Features

STL-compatible C++ template interfaces for heterogeneous parallel computing
Core algorithms: sort, scan (prefix sum), transform, reduce, gather, scatter, copy
5–100x faster sorting than STL and Intel TBB
Transparent host/device execution — works on both CPU (with TBB/OpenMP backend) and GPU
Automatic memory management for device vectors
Interoperability with raw CUDA kernels and other CUDA libraries
Open source on GitHub
Production-tested version included in the CUDA Toolkit
CUDA Fortran bindings for Fortran interoperability

Use Cases

Financial computing (cashflow generation, Libor market models, variable annuities)
Any application requiring GPU-accelerated sorting, scanning, or reduction
Accelerating existing C++ STL-based algorithms on GPU
Building higher-level GPU libraries (cuDF, RAPIDS use Thrust internally)
Scientific computing and HPC data processing
Machine learning data preprocessing

Hardware Requirements

NVIDIA GPU with CUDA support (all modern GPUs)
CUDA Toolkit installation
C++14 or later compiler
CPU backends: TBB or OpenMP for non-GPU execution

Language Bindings

C++ (primary — template library)
CUDA Fortran (Fortran interoperability)

Connections

CUB — CUB provides lower-level warp/block/device primitives that Thrust builds upon
cuBLAS — Thrust complements cuBLAS: Thrust for general parallel ops, cuBLAS for dense BLAS
cuDF — RAPIDS cuDF is built on Thrust for its parallel data processing operations
CUTLASS — both are C++ template libraries; CUTLASS for GEMM, Thrust for general parallel algorithms
cuRAND — Thrust algorithms commonly operate on data generated by cuRAND

AIPS BOOM

Explorer

Thrust

Thrust

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks