nvFatbin

Type: Technology Tags: NVIDIA, CUDA, fatbin, runtime compilation, module loading, compiler SDK Related: NVIDIA-CUDA, CUDA-Driver-API, PTX-ISA, NVRTC, nvJitLink, NVCC Sources: https://docs.nvidia.com/cuda/nvfatbin/index.html Last Updated: 2026-04-29

Summary

nvFatbin is a CUDA library for creating CUDA fat binaries at runtime. It lets applications package multiple device-code variants, such as cubin, PTX, or LTO-IR inputs, into a fatbin that can later be loaded through the CUDA-Driver-API.

Detail

Purpose

CUDA applications sometimes need to generate or assemble GPU code dynamically while still preserving architecture-specific variants. nvFatbin provides API-level control over building those fatbins without relying only on offline toolchain steps.

Key capabilities

  • Runtime creation of CUDA fat binaries.
  • Inputs can include device cubins, PTX-ISA, or LTO-IR.
  • Output can be loaded with Driver API module-loading routines.
  • Useful for applications that want architecture-specific optimized variants for Hopper, Blackwell, or other GPUs.

NVIDIA context

nvFatbin complements NVRTC, nvJitLink, and PTX-Compiler-APIs in dynamic GPU-code generation systems. It is especially relevant to frameworks, DSLs, inference runtimes, and plugin systems that compile or specialize GPU kernels at runtime.

Connections

  • CUDA-Driver-API - loads fatbins created by nvFatbin.
  • NVRTC - can generate PTX inputs for runtime packaging.
  • nvJitLink - handles runtime device-code linking before or alongside packaging flows.
  • PTX-ISA - PTX can be one of the input forms.
  • NVCC - offline compilation still produces related CUDA binary artifacts.

Source Excerpts

  • NVIDIA’s nvFatbin guide describes runtime fatbin creation for multiple CUDA source variants.