NVIDIA HPC SDK Version 25.5 Documentation

HPC SDK

HPC SDK Release Notes: These release notes describe the new features of the NVIDIA HPC SDK including changes from previous releases. They may also include late-breaking information not included in other product documentation.
HPC SDK Install Guide: This guide describes how to use the HPC Fortran, C, and C++ compilers and program development tools on CPUs and NVIDIA GPUs, including information about parallelization and optimization.

Compilers

HPC Compilers Documentation: The HPC Compilers Documentation page describes available documentation on the use and features of HPC compilers and related tools.
nvc: nvc is a C11 compiler for NVIDIA GPUs and AMD, Intel, and Arm CPUs. It invokes the C compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc supports ISO C11, supports GPU programming with OpenACC, and supports multicore CPU programming with OpenACC and OpenMP.
nvc++: nvc++ is a C++17 compiler for NVIDIA GPUs and AMD, Intel, and Arm CPUs. It invokes the C++ compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc++ supports ISO C++17, supports GPU and multicore CPU programming with C++17 parallel algorithms, OpenACC, and OpenMP.
nvfortran: nvfortran is a Fortran compiler for NVIDIA GPUs and AMD, Intel, and Arm CPUs. It invokes the Fortran compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvfortran supports ISO Fortran 2003 and many features of ISO Fortran 2008, supports GPU programming with CUDA Fortran, and GPU and multicore CPU programming with ISO Fortran parallel language features, OpenACC, and OpenMP.
nvcc: nvcc is the CUDA C and CUDA C++ compiler driver for NVIDIA GPUs. nvcc accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. nvcc produces optimized code for NVIDIA GPUs and drives a supported host compiler for AMD, Intel, and Arm CPUs.

Programming Models

C++ Parallel Algorithms: C++ 17 Parallel Algorithms introduce parallel and vector concurrency through execution policies and are supported in the NVC++ compiler.
CUDA C++ Programming Guide: A comprehensive guide to understanding and developing and optimizing code in the CUDA C++ programming environment.
CUDA Fortran Programming Guide: This guide describes how to program with CUDA Fortran, a small set of extensions to Fortran that supports and is built upon the NVIDIA CUDA programming model. CUDA Fortran is available on a variety of 64-bit operating systems for both x86 and Arm hardware platforms. CUDA Fortran includes runtime APIs and programming examples.
Using OpenACC: This section describes directive-based OpenACC programming in which compiler directives are used to specify regions of code in Fortran, C and C++ programs to be offloaded from a host CPU to an NVIDIA GPU.
Using OpenMP: OpenMP is a set of compiler directives, an applications programming interface (API), and a set of environment variables for specifying parallel execution in Fortran, C++ and C programs.

Math Libraries

cuBLAS: The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs. The cuBLAS library contains extensions for batched operations, execution across multiple GPUs, and mixed and low precision execution.
cuBLASMp: The cuBLASMp Library is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra.
cuTENSOR: The cuTENSOR Library is a first-of-its-kind GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics.
cuSPARSE: The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices, with functionality that can be used to build GPU accelerated solvers. cuSPARSE is widely used by engineers and scientists working on applications such as machine learning, computational fluid dynamics, seismic exploration and computational sciences.
cuSOLVER: The cuSOLVER library provides dense and sparse factorizations, linear solvers and eigensolvers highly optimized for NVIDIA GPUs. cuSOLVER is used to accelerate applications in diverse areas including scientific computing and data science, and has extensions for mixed precision tensor acceleration and execution across multiple GPUs.
cuSOLVERMp: cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale.
cuFFT: The cuFFT Library provides FFT implementations highly optimized for NVIDIA GPUs. cuFFT is used for building commercial and research applications across disciplines such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging, and has extensions for execution across multiple GPUs.
cuFFTMp: cuFFTMp provides a distributed-memory multi-node and multi-GPU solution for solving 2D and 3D FFTs (Fast Fourier Transforms) at scale.
cuRAND: The cuRAND library is a GPU device side implementation of a random number generator.

Communications Libraries

NCCL: NCCL, the NVIDIA Collective Communications Library, contains multi-GPU and multi-node collective communication primitives optimized for NVIDIA GPUs.
NVSHMEM: NVSHMEM is an implementation of the OpenSHMEM standard highly optimized for NVIDIA GPUs.

Tools

CUDA-GDB: The NVIDIA tool for debugging CUDA applications.
Nsight Compute: The NVIDIA Nsight Compute is the next-generation interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.
Nsight Systems: The NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize application algorithms. Helps identify optimization and tuning opportunities to scale applications efficiently across both CPUs and GPUs.
Compute Sanitizer: Compute Sanitizer is a functional correctness checking tools suite. It contains tools to perform different type of checks including the memcheck tool to check for out of bounds and misaligned memory access errors, the racecheck tool to check for data races in shared memory, the initcheck tool to check for uninitialized accesses to global memory, and the synccheck tool to check for invalid usages of synchronization primitives.
NVTX: API for annotating application events, code ranges, and resources. Use together with NVIDIA Nsight to capture and visualize.

Containerization

HPC SDK Container Guide: HPC Container Maker is an open source tool to make it easier to generate Dockerfile and Singularity container specification files.
NGC: NGC is the hub for GPU-optimized HPC and deep learning software. It takes care of all the plumbing so scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value.

Terms of Use

Software License Agreement for the HPC Software Development Kit: This document is the Software License Agreement (SLA) for NVIDIA HPC SDK. It contains specific license terms and conditions for all the HPC SDK components. By accepting this agreement, you agree to comply with all the terms and conditions applicable to the specific product(s) included herein.

NVIDIA HPC SDK Version 25.5 Documentation

HPC SDK

Compilers

Programming Models

Math Libraries

Communications Libraries

Tools

Containerization

Other Documentation

Terms of Use