HPC SDK
Version
24.1
(Documentation Archives)
-
Last updated January 30, 2024
NVIDIA HPC SDK Version 24.1 Documentation
HPC SDK
- HPC SDK Release Notes
- These release notes describe the new features of the NVIDIA HPC SDK including changes from previous releases. They may also include late-breaking information not included in other product documentation.
- HPC SDK Install Guide
- This guide describes the requirements and steps for installing the HPC SDK on compatible workstations, servers, and clusters running versions of the Linux operating systems.
Compilers
- HPC Compilers Documentation
- HPC Compiler Documentation Library
- nvc
- nvc is a C11 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc supports ISO C11, supports GPU programming with OpenACC, and supports multicore CPU programming with OpenACC and OpenMP.
- nvc++
- nvc++ is a C++17 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C++ compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvc++ supports ISO C++17, supports GPU and multicore CPU programming with C++17 parallel algorithms, OpenACC, and OpenMP.
- nvfortran
- nvfortran is a Fortran compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the Fortran compiler, assembler, and linker for the target processors with options derived from its command line arguments. nvfortran supports ISO Fortran 2003 and many features of ISO Fortran 2008, supports GPU programming with CUDA Fortran, and GPU and multicore CPU programming with ISO Fortran parallel language features, OpenACC, and OpenMP.
- nvcc
- nvcc is the CUDA C and CUDA C++ compiler driver for NVIDIA GPUs. nvcc accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. nvcc produces optimized code for NVIDIA GPUs and drives a supported host compiler for AMD, Intel, OpenPOWER, and Arm CPUs.
Programming Models
- C++ Parallel Algorithms
- C++ 17 Parallel Algorithms introduce parallel and vector concurrency through execution policies and are supported in the NVC++ compiler.
- OpenACC Getting Started Guide
- This guide introduces the NVIDIA OpenACC implementation, including examples of how to write, build and run programs using the OpenACC directives.
- OpenMP
- This section describes using OpenMP, a set of compiler directives, an applications programming interface (API), and a set of environment variables for specifying parallel execution in Fortran, C++ and C programs.
- CUDA C++ Programming Guide
- A comprehensive guide to understanding and developing and optimizing code in the CUDA C++ programming environment.
- CUDA Fortran Programming Guide
- This guide describes how to program with CUDA Fortran, a small set of extensions to Fortran that supports and is built upon the NVIDIA CUDA programming model. CUDA Fortran is available on a variety of 64-bit operating systems for both x86 and OpenPOWER hardware platforms. CUDA Fortran includes runtime APIs and programming examples.
Math Libraries
- cuBLAS
- The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs. The cuBLAS library contains extensions for batched operations, execution across multiple GPUs, and mixed and low precision execution.
- cuTENSOR
- The cuTENSOR Library is a first-of-its-kind GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics.
- cuSPARSE
- The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices, with functionality that can be used to build GPU accelerated solvers. cuSPARSE is widely used by engineers and scientists working on applications such as machine learning, computational fluid dynamics, seismic exploration and computational sciences.
- cuSOLVER
- The cuSOLVER library provides dense and sparse factorizations, linear solvers and eigensolvers highly optimized for NVIDIA GPUs. cuSOLVER is used to accelerate applications in diverse areas including scientific computing and data science, and has extensions for mixed precision tensor acceleration and execution across multiple GPUs.
- cuSOLVERMp
- cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale.
- cuFFT
- The cuFFT Library provides FFT implementations highly optimized for NVIDIA GPUs. cuFFT is used for building commercial and research applications across disciplines such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging, and has extensions for execution across multiple GPUs.
- cuFFTMp
- cuFFTMp provides a distributed-memory multi-node and multi-GPU solution for solving 2D and 3D FFTs (Fast Fourier Transforms) at scale.
- cuRAND
- The cuRAND library is a GPU device side implementation of a random number generator.
Communications Libraries
- NCCL
- NCCL, the NVIDIA Collective Communications Library, contains multi-GPU and multi-node collective communication primitives optimized for NVIDIA GPUs.
- NVSHMEM
- NVSHMEM is an implementation of the OpenSHMEM standard highly optimized for NVIDIA GPUs.
Tools
- CUDA-GDB
- The NVIDIA tool for debugging CUDA applications.
- Nsight Compute
- The NVIDIA Nsight Compute is the next-generation interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.
- Nsight Systems
- The NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize application algorithms. Helps identify optimization and tuning opportunities to scale applications efficiently across both CPUs and GPUs.
- Compute Sanitizer
- Compute Sanitizer is a functional correctness checking tools suite. It contains tools to perform different type of checks including the memcheck tool to check for out of bounds and misaligned memory access errors, the racecheck tool to check for data races in shared memory, the initcheck tool to check for uninitialized accesses to global memory, and the synccheck tool to check for invalid usages of synchronization primitives.
- NVTX
- API for annotating application events, code ranges, and resources. Use together with NVIDIA Nsight to capture and visualize.
Containerization
- HPC Container Maker
- HPC Container Maker is an open source tool to make it easier to generate Dockerfile and Singularity container specification files.
- NGC
- NGC is the hub for GPU-optimized HPC and deep learning software. It takes care of all the plumbing so scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value.
Other Documentation
- CUDA C Best Practices Guide
- This guide presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. The intent is to provide guidelines for obtaining the best performance from NVIDIA GPUs using the CUDA Toolkit.
- CUDA Runtime API
- CUDA Runtime API Abstract or Description
- NVIDIA Fortran CUDA Library Interfaces
- This document describes the NVIDIA Fortran interfaces to the cuBLAS, cuFFT, cuRAND, and cuSPARSE CUDA Libraries.
- Using OpenACC with MPI Tutorial
- This tutorial describes using the NVIDIA OpenACC compiler with MPI.
- Ampere GPU Architecture Tuning Guide
- The NVIDIA Ampere GPU architecture is NVIDIA's latest architecture for CUDA compute applications. The NVIDIA Ampere GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures. This guide summarizes the ways that an application can be fine-tuned to gain additional speedups by leveraging the NVIDIA Ampere GPU architecture's features.
- Volta Tuning Guide
- Volta is NVIDIA's 6th-generation architecture for CUDA compute applications. Applications that follow the best practices for the Pascal architecture should typically see speedups on the Volta architecture without any code changes. This guide summarizes the ways that applications can be fine-tuned to gain additional speedups by leveraging Volta architectural features.
Terms of Use
- NVIDIA Software License Agreement for the HPC Software Development Kit
- This document is the Software License Agreement (SLA) for NVIDIA HPC SDK. It contains specific license terms and conditions for all the HPC SDK components. By accepting this agreement, you agree to comply with all the terms and conditions applicable to the specific product(s) included herein.