nvJitLink

Type: Technology Tags: CUDA, NVIDIA, GPU, Compilation, JIT, Linking, Development Tools, CUDA-X Related: NVRTC, NVCC, libNVVM, CUDA-GDB Sources: NVIDIA official documentation, docs.nvidia.com/cuda/nvjitlink Last Updated: 2026-04-09

Summary

nvJitLink is an NVIDIA CUDA library for just-in-time (JIT) linking of device code (PTX and cubin) at runtime, introduced in CUDA 12.0. It enables applications that generate device code at runtime (via NVRTC or other means) to perform full device-side linking without requiring the CUDA driver’s legacy linking infrastructure. nvJitLink is particularly important for applications with complex modular GPU code architectures and for frameworks that generate specialized CUDA kernels at runtime.

Detail

Purpose

nvJitLink solves the limitation of NVRTC’s earlier runtime compilation model, which required all device code to be self-contained within a single compilation unit. Real-world GPU applications often require linking separately compiled device libraries (relocatable device code, RDC) together at runtime. nvJitLink provides a programmatic API for this runtime device code linking, enabling modular, multi-file CUDA programs to be composed and linked at runtime.

Key Features

Runtime linking of PTX, LTOIR (LTO intermediate representation), cubin, and fatbin inputs
Supports relocatable device code (RDC / -rdc=true compiled objects)
Link-time optimization (LTO) via LTOIR inputs for cross-module inlining
Programmatic API: nvJitLinkCreate, nvJitLinkAddData, nvJitLinkComplete, nvJitLinkGetLinkedCubin
Architecture-specific compilation and linking (sm_XX targeting)
Error and info log retrieval for debugging link failures
Works with NVRTC output (PTX/LTOIR) and pre-compiled device libraries
Replaced the deprecated cuLinkCreate/cuLinkAddData/cuLinkComplete driver API for new applications
Available as a standalone shared library (libnvJitLink.so / nvJitLink.dll)
Thread-safe handle-based API

Use Cases

Frameworks generating GPU kernels at runtime (e.g., PyTorch torch.compile, JAX XLA)
Applications that dynamically specialize kernels for runtime-known parameters
GPU-accelerated JIT compilers and interpreters
CUDA plugin systems where device libraries are loaded at runtime
HPC applications with runtime kernel generation
Database query engines with JIT-compiled GPU operators

Hardware Requirements

Requires CUDA 12.0 or higher (library introduced in CUDA 12.0)
NVIDIA GPU with CUDA Compute Capability 5.0+ (Maxwell minimum)
No special GPU hardware requirement beyond a CUDA-capable device

Language Bindings

C (primary API)
C++ (common usage)
Python (via ctypes/cffi wrappers in framework code)

Connections

NVRTC — nvJitLink is the companion linking library for NVRTC-compiled code; NVRTC compiles to PTX/LTOIR, nvJitLink links the result
NVCC — NVCC uses a similar offline linking pipeline; nvJitLink provides the runtime equivalent
libNVVM — libNVVM compiles NVVM IR to PTX; nvJitLink can then link the resulting device code
CUDA-GDB — debugging JIT-linked modules requires CUDA-GDB with PTX source support

AIPS BOOM

Explorer

nvJitLink

nvJitLink

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks