NVIDIA HPC SDK Version 24.1 Documentation

HPC SDK Release Notes

These release notes describe the new features of the NVIDIA HPC SDK including changes from previous releases. They may also include late-breaking information not included in other product documentation.

HPC SDK Install Guide

This guide describes the requirements and steps for installing the HPC SDK on compatible workstations, servers, and clusters running versions of the Linux operating systems.

HPC Compilers Documentation

HPC Compiler Documentation Library

nvc

nvc is a C11 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc supports ISO C11, supports GPU programming with OpenACC, and supports multicore CPU programming with OpenACC and OpenMP.

nvc++

nvc++ is a C++17 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C++ compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc++ supports ISO C++17, supports GPU and multicore CPU programming with C++17 parallel algorithms, OpenACC, and OpenMP.

nvfortran

nvfortran is a Fortran compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the Fortran compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvfortran supports ISO Fortran 2003 and many features of ISO Fortran 2008, supports GPU programming with CUDA Fortran, and GPU and multicore CPU programming with ISO Fortran parallel language features, OpenACC, and OpenMP.

nvcc

nvcc is the CUDA C and CUDA C++ compiler driver for NVIDIA GPUs. nvcc accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. nvcc produces optimized code for NVIDIA GPUs and drives a supported host compiler for AMD, Intel, OpenPOWER, and Arm CPUs.

C++ Parallel Algorithms

C++ 17 Parallel Algorithms introduce parallel and vector concurrency through execution policies and are supported in the NVC++ compiler.

OpenACC Getting Started Guide

This guide introduces the NVIDIA OpenACC implementation, including examples of how to write, build and run programs using the OpenACC directives.

OpenMP

This section describes using OpenMP, a set of compiler directives, an applications programming interface (API), and a set of environment variables for specifying parallel execution in Fortran, C++ and C programs.

CUDA C++ Programming Guide

A comprehensive guide to understanding and developing and optimizing code in the CUDA C++ programming environment.

CUDA Fortran Programming Guide

This guide describes how to program with CUDA Fortran, a small set of extensions to Fortran that supports and is built upon the NVIDIA CUDA programming model. CUDA Fortran is available on a variety of 64-bit operating systems for both x86 and OpenPOWER hardware platforms. CUDA Fortran includes runtime APIs and programming examples.

cuBLAS

The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs. The cuBLAS library contains extensions for batched operations, execution across multiple GPUs, and mixed and low precision execution.

cuTENSOR

The cuTENSOR Library is a first-of-its-kind GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics.

cuSPARSE

The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices, with functionality that can be used to build GPU accelerated solvers. cuSPARSE is widely used by engineers and scientists working on applications such as machine learning, computational fluid dynamics, seismic exploration and computational sciences.

cuSOLVER

The cuSOLVER library provides dense and sparse factorizations, linear solvers and eigensolvers highly optimized for NVIDIA GPUs. cuSOLVER is used to accelerate applications in diverse areas including scientific computing and data science, and has extensions for mixed precision tensor acceleration and execution across multiple GPUs.

cuSOLVERMp

cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale.

cuFFT

The cuFFT Library provides FFT implementations highly optimized for NVIDIA GPUs. cuFFT is used for building commercial and research applications across disciplines such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging, and has extensions for execution across multiple GPUs.

cuFFTMp

cuFFTMp provides a distributed-memory multi-node and multi-GPU solution for solving 2D and 3D FFTs (Fast Fourier Transforms) at scale.

cuRAND

The cuRAND library is a GPU device side implementation of a random number generator.

NCCL

NCCL, the NVIDIA Collective Communications Library, contains multi-GPU and multi-node collective communication primitives optimized for NVIDIA GPUs.

NVSHMEM

NVSHMEM is an implementation of the OpenSHMEM standard highly optimized for NVIDIA GPUs.

CUDA-GDB

The NVIDIA tool for debugging CUDA applications.

Nsight Compute

The NVIDIA Nsight Compute is the next-generation interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

Nsight Systems

The NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize application algorithms. Helps identify optimization and tuning opportunities to scale applications efficiently across both CPUs and GPUs.

Compute Sanitizer

Compute Sanitizer is a functional correctness checking tools suite. It contains tools to perform different type of checks including the memcheck tool to check for out of bounds and misaligned memory access errors, the racecheck tool to check for data races in shared memory, the initcheck tool to check for uninitialized accesses to global memory, and the synccheck tool to check for invalid usages of synchronization primitives.

NVTX

API for annotating application events, code ranges, and resources. Use together with NVIDIA Nsight to capture and visualize.

HPC Container Maker

HPC Container Maker is an open source tool to make it easier to generate Dockerfile and Singularity container specification files.

NGC

NGC is the hub for GPU-optimized HPC and deep learning software. It takes care of all the plumbing so scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value.

CUDA C Best Practices Guide

This guide presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. The intent is to provide guidelines for obtaining the best performance from NVIDIA GPUs using the CUDA Toolkit.

CUDA Runtime API

CUDA Runtime API Abstract or Description

NVIDIA Fortran CUDA Library Interfaces

This document describes the NVIDIA Fortran interfaces to the cuBLAS, cuFFT, cuRAND, and cuSPARSE CUDA Libraries.

Using OpenACC with MPI Tutorial

This tutorial describes using the NVIDIA OpenACC compiler with MPI.

Ampere GPU Architecture Tuning Guide

The NVIDIA Ampere GPU architecture is NVIDIA's latest architecture for CUDA compute applications. The NVIDIA Ampere GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures. This guide summarizes the ways that an application can be fine-tuned to gain additional speedups by leveraging the NVIDIA Ampere GPU architecture's features.

Volta Tuning Guide

Volta is NVIDIA's 6th-generation architecture for CUDA compute applications. Applications that follow the best practices for the Pascal architecture should typically see speedups on the Volta architecture without any code changes. This guide summarizes the ways that applications can be fine-tuned to gain additional speedups by leveraging Volta architectural features.

NVIDIA Software License Agreement for the HPC Software Development Kit

This document is the Software License Agreement (SLA) for NVIDIA HPC SDK. It contains specific license terms and conditions for all the HPC SDK components. By accepting this agreement, you agree to comply with all the terms and conditions applicable to the specific product(s) included herein.

NVIDIA HPC SDK Version 24.1 Documentation

HPC SDK

Compilers

Programming Models

Math Libraries

Communications Libraries

Tools

Containerization

Other Documentation

Terms of Use