AIPS BOOM — NVIDIA CUDA Wiki

A connected NVIDIA software, CUDA library, hardware, and model knowledge graph for mapping customer needs to NVIDIA technology.

Last updated: 2026-05-09 Total pages: 503

Concepts

NVIDIA-AI-Grid — Distributed AI infrastructure concept for workload placement across cloud, data center, and edge locations
Floating-Point-and-IEEE-754 — NVIDIA guidance on CUDA floating-point behavior, IEEE 754 compliance, FMA, and numerical accuracy

People

(none yet)

Organizations

(none yet)

Technologies

Math & Linear Algebra

cuBLAS — GPU-accelerated BLAS library: all 152 standard routines, Tensor Core GEMM, multi-GPU
cuBLASLt — Lightweight cuBLAS GEMM API for descriptor-driven matmul heuristics and tuning
cuBLASXt — cuBLAS single-node multi-GPU BLAS Level 3 host interface
cuBLASDx — Device-side BLAS-style operations for fusing dense linear algebra into CUDA kernels
cuBLASMp — Multi-process distributed dense linear algebra library with PBLAS-like APIs
cuFFT — GPU-accelerated Fast Fourier Transform library for 1D/2D/3D real and complex data
cuFFTW — FFTW3-compatible interface layer for porting FFTW applications to cuFFT
cuFFTDx — Device-side FFT library for fusing FFT operations into CUDA kernels
cuFFTMp — Distributed multi-process cuFFT library for 2D/3D multi-GPU, multi-node FFTs
cuRAND — GPU-accelerated random number generation library (pseudo and quasi-random, multiple distributions)
cuSOLVER — GPU-accelerated dense and sparse direct linear solvers and eigensolvers
cuSOLVERMp — Distributed-memory dense linear solver and eigensolver library with ScaLAPACK-like APIs
cuSPARSE — GPU-accelerated sparse matrix linear algebra (SpMV, SpMM, preconditioners)
cuSPARSELt — Structured sparse matrix-matrix multiplication library for Tensor Core sparse acceleration
cuTENSOR — GPU-accelerated tensor contraction, reduction, and elementwise operations
cuTENSORMg — cuTENSOR single-process multi-GPU tensor operation support
cuTENSORMp — cuTENSOR multi-process distributed tensor contraction support
cuDSS — Preview CUDA direct sparse solver with single-GPU, multi-GPU, and multi-node modes
AmgX — GPU-accelerated algebraic multigrid and Krylov solver library (open source)
Incomplete-LU-Cholesky — CUDA whitepaper guidance for preconditioned iterative solvers using cuSPARSE and cuBLAS
nvmath-python — Python interface to NVIDIA CUDA-X math libraries (cuBLAS, cuFFT, cuRAND, cuDSS)
NVBLAS — drop-in GPU BLAS shim; transparently redirects Level 3 BLAS calls to cuBLAS via LD_PRELOAD
NVPL — NVIDIA Performance Libraries: CPU math (BLAS, LAPACK, FFT, RAND) optimized for Grace/Arm

Deep Learning

cuDNN — GPU-accelerated primitives for deep neural networks (convolution, attention, pooling)
TensorRT — inference compiler, runtime, and model optimizer for production DNN deployment
TensorRT-for-RTX — RTX-focused TensorRT runtime with AOT/JIT portable engines for local PC and workstation AI inference
TensorRT-LLM — LLM inference optimization library with continuous batching, paged KV cache, FP8
Transformer-Engine — NVIDIA transformer acceleration library for PyTorch/JAX with FP8, MXFP8, and NVFP4 recipes
NVIDIA-Optimized-Frameworks — NVIDIA deep learning framework containers, user guide, and support matrix for PyTorch/TensorFlow/JAX environments
PyG — NVIDIA-optimized PyTorch Geometric container and graph neural network workflow for PyTorch, cuGraph, and PhysicsNeMo
NVIDIA-DGL — legacy NVIDIA DGL container/release-note surface now deprecated in favor of PyG for forward GNN workflows
CUTLASS — open-source C++ template library for custom high-performance GEMM on NVIDIA GPUs
FlashInfer — open-source GPU kernel toolkit for LLM inference (attention, batch decode, sampling)
NVIDIA-Deep-Learning-Performance — NVIDIA guidance for GPU deep learning performance, precision, Tensor Cores, and profiling
LLM-Inference-Quick-Start-Recipes — NVIDIA quick-start recipes for common LLM inference paths across NIM, TensorRT-LLM, Triton, and Dynamo

Data Processing & ML

NVIDIA-RAPIDS — CUDA-X data science framework for GPU-accelerated DataFrames, ML, graphs, vector search, image processing, Dask, and Spark
RAPIDS-Accelerator-for-Apache-Spark — GPU acceleration plugin for Spark SQL/DataFrame workloads using RAPIDS cuDF
cuDF — GPU-accelerated DataFrame library; 50x faster pandas drop-in (RAPIDS)
cuML — GPU-accelerated machine learning; 50x faster scikit-learn drop-in (RAPIDS)
cuGraph — GPU-accelerated graph analytics; 48x faster NetworkX drop-in (RAPIDS)
cuVS — GPU-accelerated vector search with world-class CAGRA ANN performance (RAPIDS)
NVIDIA-Merlin — NVIDIA recommender systems framework family spanning NVTabular, HugeCTR, Merlin Models, Systems, and Triton inference
NVIDIA-Legate-Core — runtime and framework foundation for composable accelerated libraries such as cuPyNumeric
cuOpt — GPU-accelerated operations research solver for vehicle routing and logistics optimization
NVIDIA-cuOpt-Managed-Service — hosted/API-oriented cuOpt service for routing optimization workloads
NeMo-Curator — GPU-accelerated multimodal data curation pipelines for LLM training
Morpheus — GPU-accelerated AI cybersecurity framework for real-time threat detection
nvComp — GPU-accelerated data compression and decompression library (LZ4, Snappy, ZSTD, GDeflate)
GPU-Direct-Storage — direct data path from NVMe/network storage to GPU memory bypassing CPU
cuFile-API — GPUDirect Storage API surface for registering GPU buffers/files and issuing direct storage I/O

Image & Video

NVIDIA-DALI — GPU-accelerated data loading and augmentation for deep learning training pipelines
CV-CUDA — open-source GPU-accelerated image pre/post-processing for computer vision inference
NPP — GPU-accelerated image and signal processing primitives library (5,000+ functions)
NVIDIA-Video-Codec-SDK — hardware-accelerated video encode (NVENC) and decode (NVDEC) APIs
NVIDIA-Optical-Flow-SDK — hardware-accelerated optical flow computation on Turing/Ampere/Ada GPUs
nvImageCodec — unified GPU-accelerated image codec library (JPEG, JPEG2000, TIFF, WebP, PNG)
cuCIM — GPU-accelerated image processing with scikit-image compatible API (RAPIDS)
nvJPEG — GPU-accelerated JPEG encoding/decoding; batch decode backend for DALI and nvImageCodec
nvJPEG2000 — CUDA-accelerated JPEG2000 encode/decode library
nvTIFF — CUDA-accelerated TIFF encode/decode library

Parallel Algorithms

Thrust — GPU-accelerated C++ STL-compatible parallel algorithms (sort, scan, reduce, transform)
CUB — GPU cooperative primitives library: device/block/warp-level sort, scan, reduce, histogram
cuda-compute — Python bindings for CCCL/CUB/Thrust-style host-callable parallel algorithms
NCCL — multi-GPU and multi-node collective communications (all-reduce, all-gather, broadcast)
NVSHMEM — GPU-cluster PGAS (Partitioned Global Address Space) communication via OpenSHMEM
NVSHMEM4Py — Official Python binding for NVSHMEM symmetric memory, put/get, collectives, and interoperability
NVIDIA-HPC-X — NVIDIA MPI, SHMEM, UCX, UCC, HCOLL, ClusterKit, and NCCL-RDMA-SHARP communications toolkit

Scientific & Physics

NVIDIA-Warp — open-source Python framework for GPU-accelerated physics simulation with auto-diff
cuEquivariance — NVIDIA geometric neural network library with segmented polynomials, CUDA kernels, PyTorch, and JAX
cuLitho — GPU-accelerated computational lithography (OPC, ILT) for semiconductor manufacturing
NVIDIA-Quantum — NVIDIA accelerated quantum computing platform for QPU, GPU, CUDA-Q, and NVQLink workflows
NVIDIA-NVQLink — realtime GPU-QPU integration architecture for calibration, QEC, and hybrid quantum workflows
NVIDIA-Ising — NVIDIA open AI model family for quantum calibration and quantum error correction decoding
CUDA-QX — CUDA-Q library collection for quantum error correction and quantum-classical solver workflows
CUDA-Q-Realtime — CUDA-Q realtime API and networking layer for NVQLink GPU-to-controller feedback loops
cuQuantum — GPU-accelerated quantum computing simulation (state vector, tensor network, density matrix)
cuStateVec — cuQuantum state-vector simulation component for quantum circuit workloads
cuTensorNet — cuQuantum tensor-network component for contraction paths, slicing, MPS workflows, and distributed execution
cuDensityMat — cuQuantum analog quantum dynamics solver library for states, operators, gradients, and time propagation
cuPauliProp — cuQuantum Pauli propagation library for Pauli-basis simulation, traces, truncation, and gradients
cuStabilizer — cuQuantum stabilizer simulation library for Pauli-frame Clifford circuits and DEM sampling
cuQuantum-Appliance — NGC container workflow for multi-GPU cuQuantum simulation with Qiskit and Cirq frontends
CUDA-Q — hybrid quantum-classical computing platform with GPU simulation and QPU backend support
Ising-Calibration-1-35B-A3B — NVIDIA quantum calibration VLM for analyzing QPU calibration experiment plots
Ising-Decoding — NVIDIA Ising QEC predecoder models and training framework for surface-code decoding
NVIDIA-Quantum-Cloud — cloud/API access path for running CUDA-Q projects on NVIDIA GPU systems
NVIDIA-Accelerated-Quantum-Center — NVAQC research facility integrating quantum hardware with NVIDIA AI supercomputing
NVIDIA-DGX-Quantum — queryable DGX Quantum architecture identity now redirected toward NVQLink as the current direction

Security & Cryptography

cuPQC — CUDA cryptography SDK with cuPQC-PK for ML-KEM/ML-DSA and cuPQC-Hash for hash/Merkle operations

Signal & Electronic Warfare

cuEST — GPU-accelerated RF signal processing for electronic warfare and spectrum monitoring

Development Tools

NVIDIA-CUDA — Core NVIDIA GPU computing platform, toolkit, programming model, libraries, and tools
CUDA-Quick-Start-Guide — minimal first-steps guide for installing CUDA and verifying a sample application
CUDA-Installation-Guide-Linux — full Linux CUDA Toolkit installation guide across package managers, runfile, Conda, pip, and WSL
CUDA-Installation-Guide-Windows — full Windows CUDA Toolkit installation guide with Visual Studio and sample verification
CUDA-Programming-Guide — core CUDA programming model guide covering kernels, memory, streams, graphs, and runtime behavior
CUDA-Best-Practices-Guide — practical CUDA performance guide covering APOD, profiling, memory, precision, and deployment
CUDA-Release-Notes — current CUDA Toolkit release notes for component versions, driver requirements, issues, and library updates
NVCC — NVIDIA CUDA Compiler Driver; compiles CUDA C/C++ for host and device
NVIDIA-HPC-SDK — NVIDIA HPC developer stack for compilers, CUDA programming models, libraries, tools, and containers
NVIDIA-HPC-Compilers — nvc, nvc++, nvfortran, and adjacent NVCC compiler docs for NVIDIA GPUs and Arm/x86 CPUs
CUDA-Fortran — NVIDIA Fortran extensions for explicit CUDA GPU programming through nvfortran
NVIDIA-Fortran-CUDA-Interfaces — Fortran modules/interfaces for calling CUDA libraries from CUDA Fortran, OpenACC, and OpenMP code
NVIDIA-OpenACC — NVIDIA HPC compiler implementation and guidance for OpenACC GPU offload
NVIDIA-Stdpar — NVIDIA stdpar path for C++ parallel algorithms and Fortran standard parallelism on GPUs
CUDA-GDB — GNU GDB-based GPU debugger for CUDA applications on Linux and QNX
NVRTC — NVIDIA Runtime Compilation library for JIT compilation of CUDA C++ at runtime
nvJitLink — CUDA 12 runtime device-code linker for JIT-linking PTX/cubin/LTOIR modules
libNVVM — LLVM-based NVVM IR to PTX compiler backend; enables custom GPU language compilers
NVVM-IR — LLVM-based intermediate representation for CUDA GPU compiler front ends
libdevice — Device-side bitcode library used by CUDA compiler flows for GPU math and utilities
PTX-ISA — NVIDIA virtual GPU instruction set used by CUDA compiler and JIT workflows
PTX-Compiler-APIs — Toolkit APIs for compiling PTX strings into GPU assembly code
Inline-PTX-Assembly — guide for embedding PTX assembly statements inside CUDA C++ source
PTX-Interoperability — ABI and interoperability guide for PTX generated by compilers, DSLs, and runtime systems
nvFatbin — Runtime CUDA fatbin creation API for packaging cubin, PTX, and LTO-IR variants
CUDA-Binary-Utilities — cuobjdump, nvdisasm, cu++filt, and nvprune tools for CUDA binaries
CUDA-Compile-Time-Advisor — ctadvisor tool for analyzing and reducing CUDA C++ compile time
CUDA-Demo-Suite — CUDA validation demos such as deviceQuery and bandwidthTest for checking GPU/toolkit setup
CUDA-Debugger-API — Low-level CUDA debugger integration API
CUDA-Cpp-Standard-Library — CUDA-capable C++ standard library facilities in the CCCL stack
CUDA-Python — NVIDIA umbrella for accessing CUDA from Python across core APIs, bindings, libraries, and profiling
cuda-core — Pythonic CUDA runtime/core interface for devices, streams, memory, graphs, compilation, and system inspection
cuda-bindings — Low-level Python bindings to CUDA C APIs
cuda-pathfinder — Python utilities for locating CUDA libraries, headers, tools, bitcode, and static libraries
cuda-compute — CCCL Python host-callable parallel algorithms such as reduce, scan, sort, and transform
cuda-coop — CCCL Python block/warp cooperative algorithms for Numba CUDA kernels
CUDA-Runtime-API — Higher-level CUDA API for devices, memory, streams, graphs, and launches
CUDA-Driver-API — Lower-level CUDA API for contexts, modules, memory, and explicit control
CUDA-Math-API — device-side math functions (sin, cos, exp, FP16/BF16 intrinsics) for CUDA kernels
Compute-Sanitizer — CUDA correctness suite and API for memory, race, init, sync, and custom checker workflows
ComputeEval — NVIDIA benchmark framework for evaluating LLM-generated CUDA code correctness and performance
NVTX — NVIDIA Tools Extension annotation API for profiling markers, ranges, and resource names
Nsight-Developer-Tools — Suite-level hub for NVIDIA Nsight debugging, profiling, correctness, IDE, cloud, and SDK tools
Nsight-Aftermath-SDK — SDK for collecting and analyzing GPU crash mini-dumps from D3D12/Vulkan applications
Nsight-Cloud — cloud-native Nsight profiling components for Kubernetes, containers, Operator, and Streamer workflows
Nsight-Copilot — CUDA-aware AI assistant for VS Code development and preview Nsight Compute guidance
Nsight-Compute — interactive GPU kernel profiler with hardware counters and guided analysis
Nsight-Deep-Learning-Designer — visual ONNX model design and inference profiling IDE with TensorRT/ONNX Runtime profilers
Nsight-Graphics — standalone graphics debugger/profiler for GPU Trace, frame capture, shaders, ray tracing, DRIVE, and Jetson
Nsight-Integration — Visual Studio extension that launches standalone Nsight Compute, Graphics, and Systems activities
Nsight-JupyterLab-Extension — JupyterLab extension for profiling notebook cells with Nsight Systems and Nsight Compute
Nsight-Perf-SDK — graphics profiling toolbox for collecting GPU metrics inside DirectX, Vulkan, and OpenGL applications
Nsight-Python — Python-first Nsight kernel profiling automation across kernel configurations
Nsight-Systems — system-wide CPU+GPU performance profiler and timeline visualizer
Nsight-Visual-Studio-Code-Edition — Visual Studio Code extension for CUDA language support and CUDA-aware debugging workflows
Nsight-Visual-Studio-Edition — Visual Studio integration for CUDA debugging, profiling, and GPU development on Windows
Nsight-Eclipse-Plugins — Eclipse plugin path for CUDA Linux IDE development workflows

Embedded & Edge

NVIDIA-JetPack-SDK — Jetson software stack bundling Jetson Linux, CUDA-X, AI frameworks, samples, tools, and docs
NVIDIA-Jetson-Linux — Jetson OS/BSP layer with kernel, bootloader, drivers, flashing, and platform bring-up docs
NVIDIA-VPI — Vision Programming Interface for CPU/CUDA/PVA/VIC/OFA-backed computer vision and image processing
cuDLA — CUDA API for programming NVIDIA DLA (Deep Learning Accelerator) on Jetson/DRIVE SoCs
CUDA-for-Tegra — CUDA guidance for Tegra integrated GPU platforms including Jetson and DRIVE
CUDA-on-WSL — CUDA support for Linux GPU development inside Windows Subsystem for Linux 2
CUDA-on-EFLOW — CUDA deployment guidance for EFLOW-enabled Windows edge devices running Linux AI containers

Inference & Data Transfer

NIXL — NVIDIA Inference Xfer Library; high-throughput KV cache and tensor transfer for LLM serving
NVIDIA-Dynamo — NVIDIA inference-serving platform for local, VM, and Kubernetes deployment paths
Dynamo-Disaggregated-Serving — Dynamo prefill/decode split for scalable LLM serving
Dynamo-KV-Cache-Aware-Routing — Dynamo router mode for cache-overlap and load-aware request placement
Dynamo-KV-Block-Manager — Dynamo KV cache memory layer for block management, offload, and tiering
Dynamo-Planner — Dynamo autoscaler for LLM-specific TTFT, ITL, throughput, and SLA targets
Dynamo-Profiler — Dynamo profiling tool for deployment recommendations and Planner performance data

NVIDIA Frameworks

Large Language Models & Speech

NVIDIA-NeMo — Modular suite for AI agent lifecycle management, training, microservices, retrieval, and deployment
NeMo-Platform — NeMo microservices platform for synthetic data, customization, evaluation, guardrails, and inference
NeMo-Data-Designer — NeMo Platform service for scalable synthetic dataset generation
NeMo-Customizer — NeMo Platform service for LoRA, SFT, DPO, and embedding model customization
NeMo-Evaluator — NeMo Platform service for model, RAG, retriever, and agent evaluation
NeMo-Safe-Synthesizer — NeMo Platform service for private synthetic tabular data generation
NeMo-Auditor — NeMo Platform early-access service for LLM safety audits
NeMo-AutoModel — Hugging Face-compatible NeMo training library for LLM, VLM, diffusion, and fine-tuning workflows
NeMo-RL — NeMo post-training library for reinforcement learning and alignment workflows
NeMo-Gym — NeMo RL environment and rollout-collection library for verifiable agent training data
NeMo-Run — NeMo experiment configuration, execution, and management tool for local, cluster, and cloud runs
NeMo-Megatron-Bridge — NeMo library for Hugging Face and Megatron conversion, verification, and high-scale training
NeMo-Export-Deploy — NeMo export and deployment library for TensorRT-LLM, vLLM, Triton, and Ray Serve paths
NeMo-Retriever — Multimodal extraction, embedding, indexing, retrieval, and reranking microservices for RAG
NVIDIA-Speech-NIM-Microservices — Current Speech NIM docs collection for ASR, TTS, and NMT microservices
NVIDIA-ASR-NIM — Speech-to-text NIM for Parakeet, Canary, Whisper, Conformer, and Nemotron ASR models
NVIDIA-TTS-NIM — Text-to-speech NIM for Magpie models, voices, emotional styles, and voice cloning
NVIDIA-NMT-NIM — Neural machine translation NIM for Riva Translate 1.6B and 36-language translation workflows
NVIDIA-Background-Noise-Removal-NIM — Maxine audio NIM for streaming and transactional background noise removal
NVIDIA-Agent-Intelligence-Toolkit — Framework-agnostic toolkit for agent workflows, profiling, evaluation, MCP, and A2A
NVIDIA-Resiliency-Extension — NVRx fault-tolerance, restart, checkpointing, and straggler-detection package for distributed PyTorch training
Megatron-Core — composable NVIDIA library for large-scale LLM/MoE/multimodal training primitives and parallelism APIs
Megatron-Energon — Megatron multimodal data loader for WebDataset/JSONL, blending, distributed loading, and resumable training data iteration
Megatron-LM — open-source framework for 3D-parallel LLM pre-training at trillion-parameter scale
TensorRT-LLM — LLM inference optimization: continuous batching, paged KV cache, FP8, TP/PP

Inference Serving

Triton-Inference-Server — multi-framework AI model serving platform with dynamic batching and gRPC/REST
NVIDIA-AIPerf — Current NVIDIA tool for benchmarking OpenAI-compatible AI inference latency, throughput, and telemetry
NVIDIA-GenAI-Perf — Phased-out NVIDIA generative AI benchmarking tool retained for legacy workflow lookup
Triton-Performance-Analyzer — Triton CLI for inference latency/throughput measurement under configurable load
Triton-Model-Analyzer — Triton tool for model-configuration search, profiling, and deployment reports
Triton-Model-Navigator — Triton toolkit for export, conversion, correctness testing, profiling, and deployment preparation

Physics & Scientific AI

NVIDIA-Modulus — physics-ML framework for PINNs, neural operators (FNO, DeepONet), CFD surrogates
PhysicsNeMo — large-scale geoscience physics-AI training (weather, seismic, reservoir simulation)
Earth-2 — AI-powered Earth climate digital twin; km-scale weather forecasting with neural operators
NIM-for-Earth-2-CorrDiff — Earth-2 NIM for weather downscaling and diffusion correction
NIM-for-Earth-2-FourCastNet — Earth-2 NIM for global short- to medium-range AI weather forecasting
NIM-for-DoMINO-Automotive-Aero — PhysicsNeMo NIM for automotive external-aerodynamics surrogate simulation

Robotics & Simulation

NVIDIA-Isaac — robotics development platform: Isaac Sim, Isaac Lab, Isaac ROS, manipulation, mobility, and GR00T
NVIDIA-Isaac-Sim — Omniverse-based robotics simulation application for synthetic data, sensor simulation, and validation
NVIDIA-Isaac-Lab — modular robot-learning framework for RL, imitation learning, motion planning, and sim-to-real workflows
NVIDIA-Isaac-ROS — CUDA-accelerated ROS 2 packages, models, and reference workflows for deployed robots
NVIDIA-Isaac-for-Manipulation — current Isaac ROS reference architecture for perception-driven robot-arm manipulation
NVIDIA-Isaac-for-Mobility — current Isaac ROS mobility workflow area continuing Isaac Perceptor for AMR stacks
Isaac-ROS-NITROS — NVIDIA Isaac Transport for ROS, type adaptation/negotiation, and accelerated ROS graph transport
Isaac-ROS-Visual-SLAM — cuVSLAM-based visual odometry and SLAM for stereo/IMU robot localization
Isaac-ROS-Visual-Global-Localization — cuVGL-based global localization and relocalization from stereo keyframe maps
Isaac-ROS-DNN-Inference — TensorRT/Triton-backed DNN inference infrastructure for robot perception graphs
Isaac-ROS-Object-Detection — DetectNet, Grounding DINO, RT-DETR, and YOLOv8 object detection packages for Isaac ROS
Isaac-ROS-Image-Segmentation — GPU-accelerated semantic segmentation packages for pixel-level robot perception
Isaac-ROS-DNN-Stereo-Depth — DNN stereo disparity/depth packages including ESS and FoundationStereo context
Isaac-ROS-AprilTag — accelerated fiducial marker detection package with CUDA, CPU, and PVA backend support
Isaac-ROS-Image-Pipeline — accelerated camera preprocessing and stereo image-processing package family
Isaac-ROS-Compression — H.264 camera image compression/decompression using NVIDIA NVENC/NVDEC
Isaac-ROS-SIPL-Camera — SIPL and Camera-over-Ethernet camera driver package for Jetson Thor-era ingest
Isaac-ROS-cuMotion — CUDA-accelerated robot-arm motion planning, MoveIt 2 integration, and robot segmentation
Isaac-ROS-nvblox — GPU-accelerated 3D reconstruction, mapping, ESDF/TSDF, and Nav2 costmap component
Isaac-ROS-FoundationPose — 6DoF object pose-estimation model and ROS 2 package for manipulation workflows
Isaac-ROS-FoundationStereo — stereo-depth foundation model and ROS 2 package for disparity/depth perception
NVIDIA-Isaac-GR00T — humanoid robotics foundation model platform and data pipeline for general-purpose robot skills
NVIDIA-Omniverse — OpenUSD-based 3D simulation and digital twin platform with RTX rendering
NVIDIA-Omniverse-Reference-Architectures — Omniverse architecture diagrams for RTX PRO industrial facility digital twins and technical requirements

Healthcare & Edge AI

NVIDIA-Holoscan — real-time AI sensor processing SDK for medical devices and industrial edge computing

Partner / Ecosystem Libraries

Python GPU Computing

PyTorch — CUDA-accelerated deep learning framework backed by cuDNN, cuBLAS, NCCL, and TensorRT export paths
CuPy — NumPy/SciPy-compatible GPU array library backed by cuBLAS, cuFFT, cuRAND, cuSPARSE
cuPyNumeric — NumPy API implementation on Legate for CPU, GPU, and multi-node scaling
JAX — composable function transformations (jit, grad, vmap, pmap) with XLA GPU compilation
TensorFlow-GPU — Google’s deep learning framework with CUDA/cuDNN/XLA GPU acceleration
Dask — Python parallel/distributed computing; scales RAPIDS GPU workflows across multi-GPU clusters
OpenCV-CUDA — GPU-accelerated classical computer vision (filters, geometry, feature detection, stereo)

Platforms & Products

AI Cloud & Software Platforms

NGC — NVIDIA GPU Cloud: curated hub of GPU-optimized containers, models, Helm charts, and SDKs
NVIDIA-NIM — Inference Microservices: production-ready AI inference containers with OpenAI-compatible REST API
NIM-for-Large-Language-Models — production NIM family for LLM serving with Day 0, Turbo, and Certified offerings
NIM-for-LLM-Benchmarking-Guide — NIM guide for LLM latency, throughput, concurrency, Run:ai sizing, and LoRA benchmark workflows
NVIDIA-NIM-Operator — Kubernetes operator for NIM and NeMo microservice lifecycle, model caching, and autoscaling
NVIDIA-NIM-on-GKE — Google Kubernetes Engine deployment guide for NIM microservices
NVIDIA-NIM-on-WSL2 — RTX Windows/WSL2 local deployment guide for downloadable NIM containers
NeMo-Retriever-Embedding-NIM — NIM microservice for text/image embeddings in semantic search and RAG workflows
Llama-Nemotron-Embed-1B-v2 — NVIDIA text embedding model for multilingual long-document QA retrieval
NIM-for-NV-CLIP — NIM microservice for multimodal text/image embeddings, semantic image search, and multimodal RAG
NeMo-Retriever-Reranking-NIM — NIM microservice for reranking retrieved passages by query relevance
Llama-Nemotron-Rerank-1B-v2 — NVIDIA text reranker for multilingual and cross-lingual retrieval pipelines
NIM-for-Image-OCR — NeMo Retriever OCR microservice for extracting text from images and visual document regions
NIM-for-Object-Detection — NeMo Retriever document object-detection NIM family for page, table, and chart elements
NVIDIA-Dynamo — NVIDIA inference-serving platform adjacent to NIM and disaggregated serving workflows
NVIDIA-AIStore — Distributed storage stack tailored for AI workloads and elastic clusters
NVIDIA-AI-Enterprise — End-to-end enterprise AI software suite with SLA support covering the full NVIDIA stack
NVIDIA-AI-Enterprise-Quick-Start-Guide — AI Enterprise onboarding guide for account activation, NGC access, first software install, and GPU/container verification
NVIDIA-AI-Enterprise-Software — AI Enterprise application-layer and infrastructure-layer software catalog with NGC and support-matrix context
NVIDIA-AI-Enterprise-Infrastructure-Support-Matrix — AI Enterprise compatibility matrix for infrastructure software, GPUs, platforms, OS, hypervisors, orchestration, and cloud
NVIDIA-AI-Enterprise-Lifecycle-Policy — AI Enterprise branch, compatibility, support, and EOL planning policy
NVIDIA-Enterprise-Licensing-Guide — AI Enterprise entitlement, per-GPU licensing, cloud marketplace, BYOL, support, and NVIDIA License System guidance
NVIDIA-Enterprise-Support-and-Services — enterprise support entitlement, support levels, support portal, RMA, value-add services, advisory services, and education
NVIDIA-AI-Enterprise-Bare-Metal-Deployment — AI Enterprise installation guide for physical servers, drivers, Docker, Kubernetes, and GPUDirect Storage
NVIDIA-AI-Enterprise-VMware-Deployment — AI Enterprise deployment guide for VMware vSphere, vGPU, ESXi, vCenter, and AI Enterprise VMs
NVIDIA-AI-Enterprise-Cloud-Deployment — AI Enterprise deployment guide for AWS, Azure, Google Cloud, OCI, Alibaba, Tencent, VMIs, managed Kubernetes, and OpenShift
NVIDIA-Enterprise-Reference-Architectures — NVIDIA-authored Enterprise RA family for AI factory hardware, software, observability, and deployment patterns
NVIDIA-AI-Enterprise-Software-Reference-Architecture — Full-stack AI Enterprise software RA for single-tenant production inference, fine-tuning, and RAG workloads
NVIDIA-Enterprise-RA-Observability-Guide — Observability guide for Enterprise RA dashboards, alerts, DCGM, NIM Operator, BCM, and NetQ telemetry
NVIDIA-AI-Enterprise-Security — AI Enterprise security white paper for branch strategy, container security, NIM microservices, and software delivery
NVIDIA-AI-Software-for-Regulated-Environments — AI Enterprise regulated-environment white paper for government-ready software and hardened/minimal containers
NVIDIA-AI-Factory-for-Government — Government AI factory reference design for secure, compliant, agentic, and sovereign AI deployments
Red-Hat-AI-Factory-with-NVIDIA — NVIDIA AI Enterprise deployment guide for Red Hat OpenShift AI with NIM, GPU Operator, Network Operator, and NIM Operator
NVIDIA-AI-Blueprints — NVIDIA-authored reference workflows for building AI applications on NIM, NeMo, Nemotron, and NVIDIA AI software
NVIDIA-RAG-Blueprint — NVIDIA AI Blueprint for enterprise retrieval augmented generation, multimodal RAG, evaluation, and guardrails
NVIDIA-Run-ai — NVIDIA AI workload and GPU orchestration platform for Kubernetes, AI Enterprise, Mission Control, and NIM LLM sizing workflows
NVIDIA-Run-ai-Support-and-Lifecycle — Run:ai self-hosted support phases, component alignment, and current version lifecycle dates
NVIDIA-AI-Workbench — Unified developer environment for GPU projects with one-click environment management and multi-location compute
NVIDIA-AI-Workbench-Projects — AI Workbench Git/container project model for reproducible GPU development environments
NVIDIA-AI-Workbench-Locations — AI Workbench local and remote machine abstraction for project execution
NVIDIA-AI-Workbench-Applications — AI Workbench-managed web apps, processes, native apps, and Compose apps
NVIDIA-Base-Command — AI training cluster management platform for DGX systems: job scheduling, dataset versioning, experiment tracking
NVIDIA-Base-Command-Manager — Cluster-management platform for provisioning and operating AI data center infrastructure
NVIDIA-BaseOS — Validated production operating system layer for DGX and AI factory environments
NVIDIA-DGX-Cloud — Cloud-accessible NVIDIA AI supercomputing platform
NVIDIA-Cloud-Functions — NVIDIA cloud/API delivery surface for hosted AI functions and services
NVIDIA-API-Documentation — public API documentation hub for NVIDIA-hosted model and microservice endpoints
NVIDIA-Brev — NVIDIA cloud GPU development environments for prototyping and API experimentation
NVIDIA-Bright-Cluster-Manager — NVIDIA Bright cluster-management documentation for HPC and AI infrastructure
NVIDIA-Certification-Programs — NVIDIA certification documentation across systems, software, and partner validation programs
NVIDIA-Cloud-Accelerator-NCX — cloud accelerator documentation for validated NVIDIA AI infrastructure on cloud partners
KAI-Scheduler — open-source GPU-aware Kubernetes scheduler for large-scale AI workloads
NVIDIA-Grove — Kubernetes API for topology-aware, gang-scheduled, multi-component inference workloads
NVIDIA-Fleet-Intelligence — managed GPU fleet health monitoring and predictive failure signal service
NVIDIA-NCX-Infra-Controller — bare-metal lifecycle automation and secure multi-tenant GPU infrastructure management
NVIDIA-AI-Cluster-Runtime — validated NVIDIA-accelerated Kubernetes runtime recipes for reproducible AI clusters
NVIDIA-NVSentinel — Kubernetes-native GPU fault detection and remediation
NVIDIA-DOCA-Platform-Framework — BlueField DPU provisioning and service orchestration framework for cloud environments
NVIDIA-Project-GPUd — lightweight GPU telemetry, diagnostics, and issue-identification agent listed in NCX
NVIDIA-Mission-Control - Integrated AI factory management platform for DGX B200/B300 and GB200/GB300 NVL72 environments
Optimizing-VM-Configuration-for-AI-Inference — NVIDIA white paper for topology-aware VM configuration on HGX systems for near bare-metal AI inference performance

Agent Platforms

NVIDIA-AI-Q-Blueprint — NVIDIA AI Blueprint for enterprise research agents with retrieval, citations, evaluation, and Enterprise RA sizing context
NVIDIA-Data-Flywheel-Blueprint — NVIDIA AI Blueprint for continuously optimizing agents and models with production traffic, NeMo evaluation/customization, and NIM redeployment
NVIDIA-Video-Search-and-Summarization-Blueprint — NVIDIA AI Blueprint for vision agents, video search, summarization, and alert verification
NVIDIA-Tokkio-Digital-Human-Blueprint — NVIDIA AI Blueprint for interactive digital humans using ACE, speech, RAG/LLM, and avatar animation
NVIDIA-NemoClaw — Alpha stack for running OpenClaw assistants with NVIDIA OpenShell and Nemotron models
NVIDIA-OpenShell — Secure sandboxed runtime and policy layer for autonomous AI agents

AI Application Platforms

NVIDIA-AI-Aerial — Accelerated AI-RAN platform for 5G/6G wireless network development and simulation
NVIDIA-BioNeMo — GPU-accelerated drug discovery and biomolecular AI (protein structure, molecular generation)
NVIDIA-FLARE — federated learning and privacy-preserving distributed collaboration SDK for ML, analytics, healthcare, and edge workflows
BioNeMo-Recipes — NVIDIA reference implementations for scaling biological foundation model training with Transformer Engine and PyTorch
NIM-for-AlphaFold2 — BioNeMo NIM for single-chain AlphaFold2 protein structure prediction and MSA workflows
NIM-for-AlphaFold2-Multimer — BioNeMo NIM for AlphaFold2 multimer protein-complex structure prediction
NIM-for-OpenFold2 — BioNeMo NIM for OpenFold2 monomer protein structure prediction with optional MSAs/templates
NIM-for-OpenFold3 — BioNeMo NIM for all-atom biomolecular complexes with proteins, DNA, RNA, and ligands
NIM-for-Boltz2 — BioNeMo NIM for biomolecular structure and binding-affinity prediction
NIM-for-Evo-2 — BioNeMo NIM for Evo 2 DNA sequence interpretation and generation
NIM-for-MSA-Search — BioNeMo NIM for GPU-accelerated MSA, paired MSA, and structural template search
NIM-for-RFdiffusion — BioNeMo NIM for generative protein structure and complex design
NIM-for-ProteinMPNN — BioNeMo NIM for protein sequence design from backbone structures
NIM-for-MolMIM — BioNeMo NIM for controlled small molecule generation from SMILES latent spaces
NIM-for-GenMol — BioNeMo NIM for fragment-based small molecule generation with SAFE representations
NIM-for-DiffDock — BioNeMo NIM for protein-ligand docking and pose prediction
NIM-for-ALCHEMI-Batched-Geometry-Relaxation — ALCHEMI NIM for MLIP-driven batched atomistic geometry relaxation
NIM-for-ALCHEMI-Batched-Molecular-Dynamics — ALCHEMI NIM for MLIP-driven batched molecular dynamics simulations
NVIDIA-Riva — GPU-accelerated real-time speech AI SDK: ASR, TTS, NLU with sub-100ms latency
NVIDIA-Maxine — GPU-accelerated audio/video/AR enhancement for video conferencing and media applications
NIM-for-Maxine-Studio-Voice — Maxine NIM for streaming and transactional studio-quality speech enhancement
NIM-for-Maxine-Audio2Face-2D — Maxine NIM for animating 2D portrait images from speech audio
NIM-for-Maxine-Eye-Contact — Maxine NIM for gaze correction and simulated camera-facing eye contact in video
NIM-for-Maxine-Active-Speaker-Detection — Maxine NIM for active speaker detection from video and diarized audio
NIM-for-Audio2Face-3D — Digital Human NIM for audio/emotion-driven 3D facial animation and ARKit blendshape output
NVIDIA-AI-for-Media-SDKs — Current docs hub for NVIDIA audio, video, AR, and Triton-enabled media AI SDKs
NVIDIA-Audio-Effects-SDK — AFX SDK for echo cancellation, denoise, dereverb, speaker focus, studio voice, and voice font
NVIDIA-Augmented-Reality-SDK — AR SDK for face/body tracking, landmarks, eye contact, lip sync, and active speaker detection
NVIDIA-Video-Effects-SDK — VFX SDK for AI green screen, blur, upscale, webcam denoise, relighting, and video super resolution
NVIDIA-Triton-AR-VFX-SDKs — Triton-enabled server deployment path for AR and VFX SDK features
NVIDIA-Capture-SDK — capture and stream SDKs for desktop/session capture and NVIDIA media workflows
NVIDIA-CloudXR — GPU-accelerated XR streaming platform for remote RTX-rendered spatial experiences
NVIDIA-Cosmos-Curator-LHA — Cosmos Curator/LHA documentation for large-scale video analysis and physical AI data curation
NIM-for-Cosmos-WFM — Cosmos WFM NIM for text/image/video-to-world and video transfer workflows
NIM-for-Cosmos-Reason — Cosmos Reason VLM NIM family for image, video, and text reasoning
NIM-for-Cosmos-Embed1 — Cosmos Embed1 NIM for joint video-text embeddings and physical AI video retrieval
NIM-for-Vision-Language-Models — VLM NIM family for multimodal reasoning, image understanding, and visual assistants
Llama-Nemotron-Embed-VL-1B-v2 — NVIDIA multimodal embedding model for visual document retrieval and vision RAG
Llama-Nemotron-Rerank-VL-1B-v2 — NVIDIA multimodal reranker for visual document retrieval and vision RAG
NIM-for-Visual-Generative-AI — Visual GenAI NIM family for image generation, image editing, and 3D asset generation
NVIDIA-TAO — Train, Adapt, and Optimize platform for fine-tuning and deploying CV, embedding, and VLM models
NVIDIA-Jetson-Platform-Services — Jetson edge AI microservice layer for video analytics, VLM, detection, storage, and APIs
NVIDIA-Clara — Healthcare AI platform: Parabricks (genomics), MONAI (medical imaging), Clara Guardian (smart hospital)
NVIDIA-Parabricks — Clara genomics acceleration platform for next-generation sequencing pipelines
NVIDIA-Clara-Viz — Clara medical image visualization toolkit for 2D/3D imaging and pathology data
NVIDIA-MONAI-Toolkit — NVIDIA AI Enterprise-supported MONAI distribution for medical imaging AI development
NIM-for-MAISI — Medical imaging NIM for synthetic 3D CT generation and annotation masks
NIM-for-VISTA-3D — Medical imaging NIM for interactive 3D segmentation and annotation
NVIDIA-Metropolis — Intelligent video analytics platform and partner ecosystem for smart cities, retail, and industrial AI
NVIDIA-DeepStream — GStreamer-based streaming analytics toolkit for GPU-accelerated multi-stream video AI pipelines

Enterprise Data & Storage Platforms

NVIDIA-AI-Data-Platform — Reference design for accelerating enterprise storage, retrieval, vector search, RAG, and agent data access
NVIDIA-STX — Modular AI-native storage reference architecture built around accelerated compute, BlueField, Spectrum-X, and AI software
NVIDIA-CMX — Context memory storage platform for long-context, multi-turn, and agentic inference KV-cache sharing
NVIDIA-Certified-Storage — Storage validation program for AI factory, AI Data Platform, training, inference, and KV-cache workloads

Hardware Platforms

NVIDIA-DGX — Purpose-built AI supercomputing systems: DGX H100, DGX B200, DGX SuperPOD
NVIDIA-DGX-SuperPOD — Scale-out NVIDIA AI supercomputing reference architecture for AI factories
NVIDIA-DGX-BasePOD — Prescriptive enterprise DGX reference architecture for scalable AI infrastructure
NVIDIA-DGX-BasePOD-B200-H200-H100-RA — Current BasePOD RA for two-to-eight-node DGX B200/H200/H100 enterprise AI infrastructure
NVIDIA-DGX-B200 — Blackwell-generation DGX system with eight GPUs, NVLink/NVSwitch, ConnectX-7, BlueField-3, AI Enterprise, and Mission Control
NVIDIA-DGX-SuperPOD-B200-RA — DGX B200 SuperPOD RA with 32-system scalable units and NDR400 InfiniBand
NVIDIA-DGX-SuperPOD-GB200-RA — DGX GB200 SuperPOD RA for rack-scale GB200 NVL72, NVLink 5, NDR InfiniBand, and Spectrum-4 Ethernet
NVIDIA-DGX-B300 — Blackwell Ultra DGX system generation for AI factory training and inference deployments
NVIDIA-DGX-SuperPOD-B300-Spectrum-4-Ethernet-RA — DGX B300 SuperPOD RA for Spectrum-4/Spectrum-X Ethernet and DC busbar power
NVIDIA-DGX-SuperPOD-B300-Quantum-X800-InfiniBand-RA — DGX B300 SuperPOD RA for Quantum-X800 InfiniBand and AC power
NVIDIA-DGX-Spark — Compact GB10 Grace Blackwell desktop AI computer for local model and agent development
NVIDIA-DGX-Station — GB300 Grace Blackwell Ultra deskside AI supercomputer for large local AI workloads
NVIDIA-DGX-Enterprise-Support — DGX support, infrastructure services, and education services for production AI factories
NVIDIA-GB300-NVL72 — Rack-scale Blackwell Ultra NVL72 system for dense large-model training and inference
NVIDIA-NVL72-AI-Factory — Enterprise RA for GB300 NVL72 rack-scale AI factories with Spectrum-X, Mission Control, and NVLink
NVIDIA-RTX-PRO-Server — RTX PRO Blackwell enterprise server platform for AI, rendering, simulation, and visualization
NVIDIA-RTX-PRO-AI-Factory — Enterprise RA for RTX PRO 6000 Blackwell Server Edition AI factories using the 2-8-5-200 pattern
NVIDIA-Certified-Systems — partner systems validated by NVIDIA for enterprise AI and accelerated computing workloads
NVIDIA-Data-Center-CPUs — NVIDIA data center CPU documentation covering Grace, Grace Hopper, and Grace Blackwell systems
NVIDIA-Jetson-Platform — Edge AI computing modules for robotics, drones, and intelligent cameras (Jetson Orin family)
NVIDIA-Jetson-Thor — Blackwell-generation Jetson platform for physical AI and humanoid robotics
NVIDIA-Drive-Platform — End-to-end autonomous vehicle platform: DRIVE AGX hardware, DriveWorks SDK, DRIVE Sim
NVIDIA-DRIVE-AGX-Thor — DRIVE AGX Thor developer platform for autonomous vehicle and cockpit AI development
NVIDIA-DriveOS — Automotive operating system and SDK foundation for DRIVE AGX platforms
NVIDIA-DriveWorks — DRIVE SDK modules and tools for AV sensor abstraction, calibration, image, point-cloud, and egomotion workflows
NVIDIA-DRIVE-Sim — AV simulation and synthetic-data workflows using Cosmos, Omniverse, NuRec, and dataset curation
NVIDIA-GB200-NVL72 — Rack-scale liquid-cooled system: 72 Blackwell GPUs, 36 Grace CPUs, 130 TB/s NVLink, 1,440 PFLOPS FP4
NVIDIA-HGX — Multi-GPU baseboard platform (8x SXM) for OEM servers; HGX B200, B300, Rubin NVL8 configurations
NVIDIA-HGX-AI-Factory — Enterprise RA for HGX B300 AI factories using the 2-8-9-800 pattern
NVIDIA-GB200-NVL4 — Single-server 4x B200 + 2x Grace config; 1.3 TB coherent memory, ~6 kW, OEM ecosystem entry point
NVIDIA-Vera-Rubin-POD — POD-scale Vera Rubin AI factory architecture combining Rubin compute, Groq 3 LPX, Vera CPU, BlueField-4 STX, and Spectrum-6 SPX racks
NVIDIA-Groq-3-LPX — Low-latency inference accelerator rack for Vera Rubin POD agentic AI workloads

GPU Architectures

NVIDIA-Blackwell-Architecture — 2024 architecture: FP4 Tensor Cores, NVLink 5 (1.8TB/s), NVL72 rack-scale, NVLink-C2C
NVIDIA-Vera-Rubin — Next-generation platform after Blackwell with Rubin GPUs, Vera CPU, NVLink 6, and Vera Rubin NVL144 direction
NVIDIA-Vera-Rubin-POD — POD-scale Vera Rubin AI factory architecture for five rack-scale systems operating as one AI supercomputer
NVIDIA-Vera-CPU — Custom Arm CPU in the Vera Rubin platform, positioned as the successor direction after Grace
NVIDIA-Hopper-Architecture — 2022 architecture: Transformer Engine (FP8), NVLink 4 (900GB/s), MIG, Confidential Computing
NVIDIA-Ada-Lovelace-Architecture — 2022 architecture for RTX 40/pro visualization GPUs with SER and third-generation RT Cores
NVIDIA-Ampere-Architecture — 2020 architecture for A100/A30/A10 and RTX 30-era GPUs with Tensor Core and MIG advances
NVIDIA-Turing-Architecture — 2018 architecture that introduced RTX RT Cores, Tensor Cores for graphics, and concurrent INT/FP execution
NVIDIA-Grace-CPU — NVIDIA’s ARM Neoverse V2 data center CPU; paired with GPUs via NVLink-C2C in GH200/GB200

CUDA Concepts

CUDA-Compatibility — Driver/toolkit compatibility rules for CUDA applications in managed deployments
CUDA-Blackwell-Compatibility-Guide — CUDA binary compatibility guide for running applications on Blackwell GPUs
CUDA-Blackwell-Tuning-Guide — Blackwell-specific CUDA performance tuning guide
CUDA-Hopper-Compatibility-Guide — CUDA binary compatibility guide for running applications on Hopper GPUs
CUDA-Hopper-Tuning-Guide — Hopper-specific CUDA performance tuning guide covering TMA, clusters, DPX, memory, and NVLink
CUDA-Ada-Compatibility-Guide — CUDA binary compatibility guide for running applications on Ada GPUs
CUDA-Ada-Tuning-Guide — Ada-specific CUDA performance tuning guide
CUDA-Ampere-Compatibility-Guide — CUDA binary compatibility guide for running applications on Ampere GPUs
CUDA-Ampere-Tuning-Guide — Ampere-specific CUDA performance tuning guide covering async copy, barriers, Tensor Cores, memory, and NVLink
CUDA-Turing-Compatibility-Guide — CUDA binary compatibility guide for running applications on Turing GPUs
CUDA-Turing-Tuning-Guide — Turing-specific CUDA performance tuning guide for long-lived multi-generation CUDA support
CUDA-Features-Archive — current CUDA docs reference for feature availability across toolkit and driver releases
CUDA-Graphs — Capture GPU operation sequences as a graph for single-submission replay; eliminates per-kernel CPU launch overhead
CUDA-Unified-Memory — Single-pointer CPU+GPU memory with hardware-managed demand paging; GH200 enables coherent access at NVLink bandwidth
CUDA-Streams — Sequences of ordered GPU operations enabling concurrent kernel execution and compute/transfer overlap
NVLink — NVIDIA’s proprietary high-bandwidth GPU interconnect: NVLink 5 delivers 1.8TB/s per GPU on Blackwell
GPUDirect-RDMA — Direct NIC↔GPU memory DMA path, bypassing CPU; enables high-performance inter-node GPU communication
Multi-Process-Service — MPS: enables concurrent kernel execution from multiple CUDA processes on a single GPU for improved utilization

Infrastructure & DevOps

NVIDIA-Data-Center-GPU-Drivers — Data center GPU driver release notes and deployment documentation
NVIDIA-MIG — Multi-Instance GPU partitioning for isolated slices of supported data center GPUs
NVIDIA-vGPU — Virtual GPU software and CUDA support for GPU-accelerated virtualized environments
NVIDIA-vGPU-for-Compute — AI Enterprise-licensed vGPU virtualization stack for compute VMs, MIG-backed modes, NLS licensing, and virtualized AI/HPC workloads
NVIDIA-Attestation — Attestation suite for confidential computing and platform integrity verification
NVIDIA-Cloud-Native-Technologies — NVIDIA cloud-native documentation hub for GPU Operator, Container Toolkit, Kubernetes, and container deployment
NVIDIA-GPU-Operator — Kubernetes operator automating NVIDIA driver, Container Toolkit, DCGM, MIG Manager, and device plugin lifecycle
NVIDIA-Container-Toolkit — Container runtime hook enabling GPU access from Docker/containerd/Podman without driver bundling in images
NVIDIA-DCGM — Data Center GPU Manager: cluster telemetry, health monitoring, diagnostics, and Prometheus metrics for GPU fleets
CUPTI — CUDA Profiling Tools Interface: low-level API for hardware counters, activity tracing, and CUDA API callbacks used by Nsight tools
CUPTI-Python — Python-facing CUPTI profiling and tracing documentation for CUDA applications
NVBit — Open-source binary instrumentation framework for custom GPU analysis tools without source code (NVlabs research tool)

Ecosystem & Partners

LLM Inference

vLLM — Open-source high-throughput LLM serving with PagedAttention and continuous batching; NIM-supported backend
DeepSpeed — Microsoft’s ZeRO optimizer and LLM training/inference library for multi-GPU, multi-node GPU clusters

GPU Programming

Triton-GPU-Language — OpenAI’s Python-based GPU kernel language with block-level programming model; powers torch.compile Inductor

Distributed Training

Hugging-Face-Accelerate — Thin PyTorch abstraction for multi-GPU/multi-node training across DDP, FSDP, and DeepSpeed backends

Enterprise Data Platforms

NVIDIA-Certified-for-Cloudera — NVIDIA-authored Cloudera Data Platform reference/certification guidance built on NVIDIA-Certified Systems

Networking

NVIDIA-DOCA — software framework for BlueField, SuperNIC, and ConnectX infrastructure offload, DOCA-Host, and DOCA-OFED
DOCA-GPUNetIO — GPU-centric network packet processing with GPUDirect RDMA and GPU-initiated networking
DOCA-Flow — hardware-accelerated packet-processing pipes, flow steering, actions, monitoring, and forwarding
DOCA-RDMA — DOCA API for asynchronous RDMA operations over InfiniBand or RoCE
DOCA-DPA — BlueField Data Path Accelerator programming model for communication-centric offloads
DOCA-PCC — programmable congestion-control API for BlueField/Ethernet/RoCE networking
DOCA-Telemetry-Service — DOCA service for BlueField, host, network, Prometheus, and OpenTelemetry metrics
DOCA-App-Shield — DPU-side host and VM memory introspection API for intrusion detection and forensics
DOCA-Device-Emulation — DOCA subsystem for emulating host-facing PCIe devices from BlueField software
DOCA-SNAP — BlueField storage virtualization services for NVMe, virtio-blk, and virtio-fs emulation
OVS-DOCA — DOCA-backed Open vSwitch datapath offload using DOCA Flow on BlueField/NVIDIA NICs
NVIDIA-DOCA-OFED — current DOCA-Host Linux driver profile replacing standalone MLNX_OFED for NVIDIA networking
NVIDIA-MLNX-OFED — legacy standalone Linux VPI/RDMA stack for InfiniBand, Ethernet, and RoCE, now on 2024 LTS
NVIDIA-MLNX-EN — legacy standalone Linux Ethernet/RoCE driver package transitioning into DOCA-Host profiles
NVIDIA-WinOF-2 — Windows driver package for ConnectX-4 Lx and newer adapters, with current ConnectX-9 support
NVIDIA-Firmware-Tools — MFT firmware, configuration, and debug tools for NVIDIA adapters and switches
NVIDIA-Network-Operator — Kubernetes operator for NVIDIA networking, RDMA, GPUDirect RDMA, SR-IOV, secondary networks, and DOCA-OFED
NVIDIA-Cumulus-Linux — Linux-based Ethernet switch OS for NVIDIA Spectrum and Spectrum-X fabrics
NVIDIA-NetQ — network operations and observability tool set for Cumulus, Spectrum, NVLink, and data center fabrics
NVIDIA-DSX-Air — cloud-hosted network simulation and digital twin platform for validating NVIDIA networking configurations
NVIDIA-ConnectX-InfiniBand — NVIDIA ConnectX NICs and Quantum InfiniBand switches powering DGX SuperPODs and HPC clusters (up to 400Gb/s)
NVIDIA-ConnectX-9 — 1.6Tb/s-class SuperNIC for next-generation InfiniBand/Ethernet AI networking
NVIDIA-Quantum-X800-InfiniBand — End-to-end 800 Gb/s InfiniBand platform for massive-scale AI and HPC fabrics
NVIDIA-Spectrum-X-Validated-Solution-Stack — Current validated software/firmware stack table for Spectrum-X AI factory deployments
NVIDIA-Spectrum-6-SPX — Vera Rubin POD networking rack using Spectrum-X Ethernet or Quantum-X800 InfiniBand options
NVIDIA-BlueField-DPU — Data Processing Unit combining ConnectX NIC with ARM CPU and hardware accelerators for infrastructure offload
NVIDIA-BlueField-4 — Next-generation DPU tied to STX, CMX, AI-native storage, context memory, and AI data platforms
NVIDIA-Rivermax — optimized networking SDK for GPUDirect media/data streaming, SMPTE ST 2110, BlueField, and ConnectX workflows

NVIDIA Model Families

Large Language Models

Nemotron — NVIDIA model family for reasoning, safety, speech, OCR, retrieval, multimodal, and agentic AI workflows
Nemotron-Training-Recipes — NVIDIA public recipe/cookbook stack for Nemotron 3 Nano and Super pretraining, SFT, RL, evaluation, and execution
Nemotron-3-Nano — efficient Nemotron 3 text reasoning model for agent steps, reasoning modes, and NeMo Megatron Bridge workflows
Nemotron-3-Super — high-capacity Nemotron 3 reasoning model for long-context, coding, planning, and complex agentic workflows
Nemotron-3-Ultra — largest Nemotron 3 base/forthcoming model direction for frontier open reasoning workflows
Nemotron-3-Nano-Omni — open omnimodal Nemotron 3 model and NIM for text, image, video, audio, document, and GUI reasoning
Nemotron-Parse — Nemotron document parser and NIM for text/table extraction, semantic classes, bounding boxes, and reading-order structure
NVLM — Frontier-class multimodal LLM (72B); dual-path NVLM-D/H/X architecture; competitive with GPT-4V
NVIDIA-EAGLE — Efficient multimodal LLMs (EAGLE2); context-aware tiling; synthetic data training pipeline

Speech & Audio

Nemotron-ASR-Streaming — NVIDIA English streaming ASR model with cache-aware FastConformer-RNNT architecture
Nemotron-3-VoiceChat — 12B full-duplex speech-to-speech Nemotron model for realtime voice agents
Parakeet-ASR — State-of-the-art English ASR (0.6B–1.1B); FastConformer encoder; CTC/RNN-T/TDT decoding
NVIDIA-Canary — Multilingual ASR + speech translation (EN/ES/DE/FR); encoder-decoder; Canary-1B
NVIDIA-Fugatto — Generative audio transformer: text-to-audio, voice transformation, compositional sound generation

Alignment & Control

NVIDIA-SteerLM — Inference-time LLM behavior control via multi-attribute conditioning; HelpSteer dataset

On-Device AI

NVIDIA-ChatRTX — Local RAG chatbot for Windows RTX PCs; TensorRT-LLM backend; no cloud required

Multimodal

NVIDIA-ACE — Avatar Cloud Engine: AI microservices for interactive digital humans and NPCs (ASR+LLM+TTS+animation)

Quantum Models

NVIDIA-Ising — NVIDIA open model family for AI-assisted quantum calibration and QEC
Ising-Calibration-1-35B-A3B — NVIDIA Ising-family VLM for quantum calibration plot understanding
Ising-Decoding — NVIDIA Ising-family QEC predecoder models and training framework

World Models

NVIDIA-Cosmos — World foundation model platform for physical AI; video generation for synthetic robotics/AV training data

NVIDIA Research

Overview

NVIDIA-Research — NVIDIA’s central research org; 300+ researchers; NeRF, StyleGAN, Instant NGP, Megatron scaling

Neural Rendering

NVIDIA-NeRF — Neural Radiance Fields: novel view synthesis from sparse images; NVIDIA co-invented (ECCV 2020)
NVIDIA-Instant-NGP — Instant NGP: multiresolution hash encoding for NeRF training in seconds (SIGGRAPH 2022)

3D Generation

NVIDIA-GET3D — GAN-based 3D textured mesh generation from 2D image supervision (NeurIPS 2022)

Image Synthesis

NVIDIA-GauGAN — SPADE semantic image synthesis; GauGAN2 multimodal input; powers NVIDIA Canvas app

Real-Time Rendering

NVIDIA-DLSS — AI super-sampling and frame generation: DLSS 4 Multi Frame Generation on Blackwell
NVIDIA-RTX — RTX platform: RT Cores, Tensor Cores, SER, OptiX 8; DirectX Raytracing and Vulkan RT

Networking & Scale

InfiniBand

NVIDIA-Quantum-InfiniBand — Quantum-2 NDR 400Gb/s InfiniBand switches with SHARP in-network allreduce
NVIDIA-Quantum-X800-InfiniBand — Quantum-X800 / XDR 800Gb/s InfiniBand platform with SHARP v4, ConnectX-8/9, LinkX, and UFM
NVIDIA-UFM — Unified Fabric Manager: InfiniBand/Ethernet fabric management, monitoring, and routing

Ethernet AI Networking

NVIDIA-Spectrum-X — Spectrum-4 400GbE AI networking platform; Adaptive Routing for lossless Ethernet RDMA
NVIDIA-Spectrum-X-Validated-Solution-Stack — Validated Spectrum-X software and firmware compatibility table for GB300, B300, and H200 deployments
NVIDIA-Spectrum-6-SPX — Vera Rubin POD networking rack for Spectrum-X Ethernet or Quantum-X800 InfiniBand
NVIDIA-Silicon-Photonics — Optical networking direction for scaling future Spectrum-X/AI factory fabrics
NVIDIA-Cumulus-Linux — validated switch OS layer for current Spectrum-X reference architecture releases
NVIDIA-NetQ — fabric operations visibility and validation for Ethernet AI networking

Collective Communication

NCCL-Algorithms — Ring and tree allreduce algorithms, SHARP offload, topology-aware selection in NCCL
NVIDIA-HPC-X — MPI/SHMEM/UCX/UCC toolkit with NCCL-RDMA-SHARP and Spectrum-X plugin guidance

Developer Experience

Programs & Events

NVIDIA-Developer-Program — Free developer program: SDK access, NGC, DLI courses, beta programs, forums
NVIDIA-GTC — GPU Technology Conference: annual developer/industry event; Jensen keynote; 1,000+ sessions

Cloud Labs & Catalogs

NVIDIA-LaunchPad — Free cloud GPU lab environments for POC evaluation and hands-on developer access
NVIDIA-NGC-Catalog — NGC model and container marketplace: 600+ models, NIM integration, monthly-updated containers

Additional CUDA-X / Libraries

GPU Programming Abstractions

CUDA-Tile — NVIDIA tile-based CUDA programming model for Tensor Core-oriented kernels
CUDA-Tile-IR — Low-level CUDA Tile bytecode/specification and tile virtual instruction set
cuTile — Python DSL implementation of CUDA Tile for tiled GPU kernels

CPU Math (Grace/Arm)

NVPL-FFT — NVPL FFT: FFTW3-compatible CPU FFT for NVIDIA Grace (Neoverse V2); SVE-optimized

Physics Simulation (Advanced)

NVIDIA-Warp-Advanced — Warp advanced features: FEM, NanoVDB volumes, differentiable rendering, Isaac Lab integration

LLM Safety & Optimization

NeMo-Guardrails — Programmable LLM and agent safety library/microservice with Colang, catalog rails, NemoGuard NIM integration, and LangChain/LangGraph hooks
NVIDIA-NemoGuard-NIMs — Guardrail NIM family for content safety, topic control, and jailbreak detection
Nemotron-3-Content-Safety — NVIDIA multimodal, multilingual content-safety model for prompt, image, and response moderation
Nemotron-Content-Safety-Reasoning-4B-Experimental-NIM — Day 0 NIM LLM safety classifier for content-safety reasoning and dialogue moderation
Llama-3.1-Nemotron-Safety-Guard-8B-NIM — Multilingual content-safety NIM for user and bot message moderation
Llama-3.1-NemoGuard-8B-TopicControl-NIM — Topic-control NIM for keeping conversations within developer-defined boundaries
Llama-3.1-NemoGuard-8B-ContentSafety-NIM — Content-safety NIM for harmful-content detection in LLM applications
NVIDIA-NemoGuard-JailbreakDetect-NIM — Classify-endpoint NIM for jailbreak and prompt-injection detection
NIM-for-Multimodal-Safety — Multimodal moderation NIM family for visual and generated-content safety checks
TensorRT-Model-Optimizer — Model quantization and pruning: FP8/INT4/FP4, QAT, PTQ, TRT-LLM export

Projects

(none yet)

Events

(none yet)

Strategies

NVIDIA-Enterprise-AI-Factory — NVIDIA design-guide strategy for production enterprise AI factories across compute, networking, storage, software, security, and operations

AIPS BOOM

Explorer

index

AIPS BOOM — NVIDIA CUDA Wiki

Concepts

People

Organizations

Technologies

Math & Linear Algebra

Deep Learning

Data Processing & ML

Image & Video

Parallel Algorithms

Scientific & Physics

Security & Cryptography

Signal & Electronic Warfare

Development Tools

Embedded & Edge

Inference & Data Transfer

NVIDIA Frameworks

Large Language Models & Speech

Inference Serving

Physics & Scientific AI

Robotics & Simulation

Healthcare & Edge AI

Partner / Ecosystem Libraries

Python GPU Computing

Platforms & Products

AI Cloud & Software Platforms

Agent Platforms

AI Application Platforms

Enterprise Data & Storage Platforms

Hardware Platforms

GPU Architectures

CUDA Concepts

Infrastructure & DevOps

Ecosystem & Partners

LLM Inference

GPU Programming

Distributed Training

Enterprise Data Platforms

Networking

NVIDIA Model Families

Large Language Models

Speech & Audio

Alignment & Control

On-Device AI

Multimodal

Quantum Models

World Models

NVIDIA Research

Overview

Neural Rendering

3D Generation

Image Synthesis

Real-Time Rendering

Networking & Scale

InfiniBand

Ethernet AI Networking

Collective Communication

Developer Experience

Programs & Events

Cloud Labs & Catalogs

Additional CUDA-X / Libraries

GPU Programming Abstractions

CPU Math (Grace/Arm)

Physics Simulation (Advanced)

LLM Safety & Optimization

Projects

Events

Strategies

Graph View

Table of Contents