Nsight Systems

Type: Technology Tags: CUDA, NVIDIA, GPU, Profiling, System Analysis, Development Tools, CUDA Toolkit Related: Nsight-Developer-Tools, Nsight-Cloud, Nsight-Compute, Nsight-JupyterLab-Extension, Nsight-Python, Nsight-Deep-Learning-Designer, Nsight-Graphics, Nsight-Integration, Nsight-Visual-Studio-Code-Edition, Nsight-Visual-Studio-Edition, Nsight-Eclipse-Plugins, NVTX, NVCC, CUDA-GDB, Compute-Sanitizer Sources: NVIDIA official documentation (docs.nvidia.com/cuda), https://docs.nvidia.com/nsight-python/index.html, https://developer.nvidia.com/nsight-dl-designer, https://developer.nvidia.com/nsight-graphics/get-started, https://docs.nvidia.com/nsight-vs-integration/getting-started/index.html, https://developer.nvidia.com/nsight-cloud, https://docs.nvidia.com/nsight-systems/UserGuide/#profiling-services-in-the-cloud, https://docs.nvidia.com/nsight-systems/UserGuide/#profiling-within-jupyterlab Last Updated: 2026-04-29

Summary

Nsight Systems is NVIDIA’s system-wide performance profiler for GPU-accelerated applications, providing a unified timeline view of CPU, GPU, and memory activity across the entire application. It traces CUDA API calls, kernel launches, memory transfers, OS events, and framework-level operations (PyTorch, TensorFlow) to identify where time is spent at the application level — the starting point before diving into kernel-level optimization with Nsight Compute.

Detail

Purpose

Before optimizing individual GPU kernels, developers need to understand the big picture: where is the application spending time? Is it GPU-bound, CPU-bound, or communication-bound? Is data transfer or kernel launch overhead the bottleneck? Nsight Systems provides this application-level view, enabling developers to identify the right kernels and operations to optimize before using Nsight Compute for deep dives.

Key Features

  • System-wide timeline: CPU threads, GPU kernels, memory transfers, CUDA API calls in one view
  • Framework-level tracing: PyTorch, TensorFlow, JAX, TensorRT, DALI, cuDNN annotations
  • Multi-GPU support: trace multiple GPUs simultaneously
  • Network and communication profiling: NVLink, InfiniBand, NCCL collective operations
  • OS-level event tracing: thread scheduling, I/O operations
  • Timeline visualization with zoom and filter capabilities
  • Report export for sharing and offline analysis
  • Command-line interface (nsys) for headless/CI usage
  • CPU sampling for identifying hot functions on host
  • Supports CUDA, OpenGL, Vulkan, Direct3D APIs
  • Cloud and Kubernetes profiling workflows through Nsight-Cloud, including sidecar injection and browser-accessible report viewing.

Use Cases

  • Identifying CPU-GPU synchronization bottlenecks
  • Finding idle GPU time due to data starvation
  • Profiling NCCL communication overhead in distributed training
  • Understanding PyTorch/TensorFlow operator execution timelines
  • CI/CD performance regression testing
  • End-to-end pipeline optimization (data loading, preprocessing, inference)

Hardware Requirements

  • NVIDIA GPU with CUDA support
  • All modern NVIDIA GPU architectures supported
  • Available on Linux, Windows, and macOS
  • CUDA Toolkit or standalone Nsight Systems installation

Language Bindings

  • Command-line tool (nsys) — works on any application
  • Python API for automated report analysis
  • GUI application (cross-platform)
  • NVTX (NVIDIA Tools Extension) API for custom annotations from user code (C, C++, Python, Fortran)

Connections

  • Nsight-Compute — Nsight Systems provides the high-level view; Nsight Compute provides per-kernel deep analysis
  • Nsight-Cloud - cloud-native deployment path for Nsight Systems profiling in Kubernetes and remote cluster environments.
  • Nsight-JupyterLab-Extension - notebook-cell workflow for launching Nsight Systems profiling from JupyterLab.
  • Nsight-Python — Python automation layer for Nsight-driven kernel profiling workflows.
  • NVTX - annotation API whose markers and ranges appear in Nsight Systems timelines.
  • Nsight-Deep-Learning-Designer - adjacent Nsight IDE for model-graph editing and TensorRT/ONNX Runtime inference profiling.
  • Nsight-Graphics - graphics profiling/debugging companion for ray tracing, GPU Trace, and frame-level analysis.
  • Nsight-Integration - Visual Studio extension can launch Nsight Systems activities from the Visual Studio Nsight menu.
  • Nsight-Visual-Studio-Code-Edition - VS Code CUDA debugging workflow adjacent to Nsight profiling workflows.
  • Nsight-Visual-Studio-Edition — Windows IDE integration for adjacent Nsight workflows
  • Nsight-Eclipse-Plugins — Eclipse plugin path for CUDA IDE integration on Linux
  • NVCC — NVCC-compiled CUDA code is profiled by Nsight Systems
  • NCCL — NCCL communication operations appear in the Nsight Systems timeline for distributed training analysis
  • CUDA-GDB — CUDA-GDB provides interactive debugging; Nsight Systems provides performance profiling

Resources