Overview#

Package structure#

A simplified view of the directory structure is:

nvidia_hpc_benchmarks
├── ...
├── <benchmark directories>
├── ...
├── lib
├── hpc-benchmarks-gpu-env.sh
├── hpc-benchmarks-cpu-env.sh (Arm SBSA only)
├── hpcg.sh
├── hpcg-aarch64.sh  (Arm SBSA only)
├── hpl-mxp.sh
├── hpl-mxp-aarch64.sh (Arm SBSA only)
├── hpl.sh
├── hpl-aarch64.sh (Arm SBSA only)
├── stream-gpu-test.sh
└── stream-cpu-test.sh (Arm SBSA only)

Each benchmark is run using its respective run script (e.g., hpl.sh runs HPL). Each script will source either hpc-benchmarks-cpu-env.sh or hpc-benchmarks-gpu-env.sh which will add all libraries in lib to LD_LIBRARY_PATH. The run scripts use relative paths and the lib path will be used regardless of where the scripts are called from. The run scripts are not intended to be moved outside the package directory structure.

By default, the applications will link to libmpi.so (e.g., not libmpi.so.12). If libmpi.so is not part of the MPI distribution on the target system, you should create a symlink from libmpi.so to the MPI library.

Using custom libraries#

The package contains the following libraries:

NVIDIA NCCL 2.25.1
AWS OFI NCCL 1.6.0
NVIDIA NVSHMEM 3.2.5
NVIDIA GDR Copy 2.4
NVIDIA NVPL BLAS 25.1 (Arm SBSA only)
NVIDIA NVPL LAPACK 25.1 (Arm SBSA only)
NVIDIA NVPL Sparse 25.1 (Arm SBSA only)

It’s possible to substitute any of the libraries above by setting the following environment variables:

export GDRCOPY_PATH=<path to NVIDIA GDR Copy libraries>

export NCCL_PATH=<path to NVIDIA NCCL libraries>

export NVSHMEM_PATH=<path to NVIDIA NCCL libraries>

export NCCL_OFI_PATH=<path to AWS OFI NCCL libraries>

export NVPL_BLAS_PATH=<path to NVIDIA NVPL BLAS libraries>

export NVPL_LAPACK_PATH=<path to NVIDIA NVPL LAPACK libraries>

export NVPL_SPARSE_PATH=<path to NVIDIA NVPL Sparse libraries>

For example, if you wanted to use a different version of the NCCL library, you could set:

export NCCL_PATH=/opt/nccl-2.25.1/lib
srun ... hpl.sh ...

NVSHMEM#

The NVSHMEM bootstrap plugin (nvshmem_bootstrap_mpi.so) is precompiled separately for use with either MPICH or OpenMPI, meaning there is a distinct bootstrap plugin for each MPI library.

For HPE/Cray systems with MPI libraries ABI-compatible with MPICH, the following environment variables are required for NVSHMEM; these are all set by default in hpc-benchmarks-gpu-env.sh (which is used by the run scripts, e.g. hpl.sh).

export NVSHMEM_REMOTE_TRANSPORT=libfabric
export NVSHMEM_LIBFABRIC_PROVIDER=cxi
export NVSHMEM_DISABLE_CUDA_VMM=1
export MPICH_NO_BUFFER_ALIAS_CHECK=1

By default, NVSHMEM is bootstrapped using MPI. On some systems, this can be problematic, so it is also possible to initialize NVSHMEM using a unique ID (UID). In HPL, this is achieved by setting HPL_NVSHMEM_INIT. For example:

export HPL_NVSHMEM_INIT=0 # initialize using MPI (default)
export HPL_NVSHMEM_INIT=1 # initialize using UID

CUDA-aware MPI for Cray MPICH#

Using CUDA-aware MPI with Cray MPICH requires two things:

Linking the GTL library to a binary
Setting the environment variable MPICH_GPU_SUPPORT_ENABLED=1

CUDA-aware MPI is used in the following circumstances:

HPL, when HPL_P2P_AS_BCAST=2 is used.
HPL-MxP, when --use-mpi-panel-broadcast > 0 is used.

CUDA-aware MPI with Cray MPICH is enabled through the GPU Transport Layer library. This library is not distributed as part of this package so must be linked to binaries explicitly.

This easiest way to do this is to add the library to the LD_PRELOAD environment variable:

export LD_PRELOAD="<path>/libmpi_gtl_cuda.so $LD_PRELOAD"

Note: the delimiter for LD_PRELOAD is a space character, not a colon.

Running#

The MPICH_GPU_SUPPORT_ENABLED=1 environment variable needs to be set.
If you encounter any issues with NCCL on Slingshot 11 systems, refer to HPE Slingshot Application Note: Running NCCL-Based Applications document.