HPC SDK Version 21.5 (Documentation Archives) - Last updated May 27, 2021

NVIDIA HPC SDK Version 21.5 Documentation


HPC SDK Release Notes
These release notes describe the new features of the NVIDIA HPC SDK including changes from previous releases. They may also include late-breaking information not included in other product documentation.
HPC SDK Install Guide
This guide describes the requirements and steps for installing the HPC SDK on compatible workstations, servers, and clusters running versions of the Linux operating systems.


HPC Compilers Documentation
HPC Compiler Documentation Library
nvc is a C11 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc supports ISO C11, supports GPU programming with OpenACC, and supports multicore CPU programming with OpenACC and OpenMP.
nvc++ is a C++17 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C++ compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc++ supports ISO C++17, supports GPU and multicore CPU programming with C++17 parallel algorithms, OpenACC, and OpenMP.
nvfortran is a Fortran compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the Fortran compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvfortran supports ISO Fortran 2003 and many features of ISO Fortran 2008, supports GPU programming with CUDA Fortran, and GPU and multicore CPU programming with ISO Fortran parallel language features, OpenACC, and OpenMP.
nvcc is the CUDA C and CUDA C++ compiler driver for NVIDIA GPUs. nvcc accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. nvcc produces optimized code for NVIDIA GPUs and drives a supported host compiler for AMD, Intel, OpenPOWER, and Arm CPUs.

Programming Models

C++ Parallel Algorithms
C++ 17 Parallel Algorithms introduce parallel and vector concurrency through execution policies and are supported in the NVC++ compiler.
OpenACC Getting Started Guide
This guide introduces the NVIDIA OpenACC implementation, including examples of how to write, build and run programs using the OpenACC directives.
This section describes using OpenMP, a set of compiler directives, an applications programming interface (API), and a set of environment variables for specifying parallel execution in Fortran, C++ and C programs.
CUDA C++ Programming Guide
A comprehensive guide to understanding and developing and optimizing code in the CUDA C++ programming environment.
CUDA Fortran Programming Guide
This guide describes how to program with CUDA Fortran, a small set of extensions to Fortran that supports and is built upon the NVIDIA CUDA programming model. CUDA Fortran is available on a variety of 64-bit operating systems for both x86 and OpenPOWER hardware platforms. CUDA Fortran includes runtime APIs and programming examples.

Math Libraries

The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs. The cuBLAS library contains extensions for batched operations, execution across multiple GPUs, and mixed and low precision execution.
The cuTENSOR Library is a first-of-its-kind GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics.
The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices, with functionality that can be used to build GPU accelerated solvers. cuSPARSE is widely used by engineers and scientists working on applications such as machine learning, computational fluid dynamics, seismic exploration and computational sciences.
The cuSOLVER library provides dense and sparse factorizations, linear solvers and eigensolvers highly optimized for NVIDIA GPUs. cuSOLVER is used to accelerate applications in diverse areas including scientific computing and data science, and has extensions for mixed precision tensor acceleration and execution across multiple GPUs.
The cuFFT Library provides FFT implementations highly optimized for NVIDIA GPUs. cuFFT is used for building commercial and research applications across disciplines such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging, and has extensions for execution across multiple GPUs.
The cuRAND library is a GPU device side implementation of a random number generator.

Communications Libraries

NCCL, the NVIDIA Collective Communications Library, contains multi-GPU and multi-node collective communication primitives optimized for NVIDIA GPUs.
NVSHMEM is an implementation of the OpenSHMEM standard highly optimized for NVIDIA GPUs.


The NVIDIA tool for debugging CUDA applications.
Nsight Compute
The NVIDIA Nsight Compute is the next-generation interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.
Nsight System
The NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize application algorithms. Helps identify optimization and tuning opportunities to scale applications efficiently across both CPUs and GPUs.
Compute Sanitizer
Compute Sanitizer is a functional correctness checking tools suite. It contains tools to perform different type of checks including the memcheck tool to check for out of bounds and misaligned memory access errors, the racecheck tool to check for data races in shared memory, the initcheck tool to check for uninitialized accesses to global memory, and the synccheck tool to check for invalid usages of synchronization primitives.
API for annotating application events, code ranges, and resources. Use together with NVIDIA Nsight to capture and visualize.


HPC Container Maker
HPC Container Maker is an open source tool to make it easier to generate Dockerfile and Singularity container specification files.
NGC is the hub for GPU-optimized HPC and deep learning software. It takes care of all the plumbing so scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value.

Other Documentation

CUDA C Best Practices Guide
This guide presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. The intent is to provide guidelines for obtaining the best performance from NVIDIA GPUs using the CUDA Toolkit.
CUDA Runtime API
CUDA Runtime API Abstract or Description
NVIDIA Fortran CUDA Library Interfaces
This document describes the NVIDIA Fortran interfaces to the cuBLAS, cuFFT, cuRAND, and cuSPARSE CUDA Libraries.
Using OpenACC with MPI Tutorial
This tutorial describes using the NVIDIA OpenACC compiler with MPI.
Ampere GPU Architecture Tuning Guide
The NVIDIA Ampere GPU architecture is NVIDIA's latest architecture for CUDA compute applications. The NVIDIA Ampere GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures. This guide summarizes the ways that an application can be fine-tuned to gain additional speedups by leveraging the NVIDIA Ampere GPU architecture's features.
Volta Tuning Guide
Volta is NVIDIA's 6th-generation architecture for CUDA compute applications. Applications that follow the best practices for the Pascal architecture should typically see speedups on the Volta architecture without any code changes. This guide summarizes the ways that applications can be fine-tuned to gain additional speedups by leveraging Volta architectural features.

Terms of Use

NVIDIA Software License Agreement for the HPC Software Development Kit
This document is the Software License Agreement (SLA) for NVIDIA HPC SDK. It contains specific license terms and conditions for all the HPC SDK components. By accepting this agreement, you agree to comply with all the terms and conditions applicable to the specific product(s) included herein.

NVIDIA websites use cookies to deliver and improve the website experience. See our cookie policy for further details on how we use cookies and how to change your cookie settings.