NVSHMEM and cuFFTMp

Usage

cuFFTMp is based on, and compatible with, NVSHMEM. In the following, assume NVSHMEM is installed in ${NVSHMEM_HOME}.

cuFFTMp APIs that accept void * memory buffers pointers (e.g. cufftExecC2C, cufftMpExecReshapeAsync, …) need to be passed memory buffers allocated using nvshmem_malloc and freed with nvshmem_free. Those APIs are available in the NVSHMEM headers, included using #include <nvshmem.h> with -I${NVSHMEM_HOME}/include added as a compiler flags.

User applications should link to the libnvshmem_host.so library at link time in any case, as well as libnvshmem_device.a if any NVSHMEM API is directly used. This can usually be done by passing the -L${NVSHMEM_HOME}/lib -lnvshmem_host -lnvshmem_device flags to the linker.

Finally, cuFFTMp requires all the NVSHMEM libraries to be available on the system at runtime, for instance by defining

export LD_LIBRARY_PATH="${NVSHMEM_HOME}/lib:$LD_LIBRARY_PATH"

in the environment prior to using cuFFTMp.

NVSHMEM initialization

Users wishing to use any NVSHMEM API should initialize NVSHMEM in their application prior to calling cuFFTMp routines.

cuFFTMp will automatically initialize NVSHMEM as needed when calling cufftMakePlan2d or cufftMakePlan3d, and finalize it when calling cufftDestroy.

However, initialization overhead will be reduced if NVSHMEM is initialized prior to calling any cuFFTMp API. In particular, if cuFFTMp plans are repeatedly created and destroyed in a loop, initializing NVSHMEM before the loop will minimize cuFFTMp planning time.

Compatibility

cuFFTMp requires a specific version of NVSHMEM to be installed on the system. == indicates that an exact match is required. >=, <= indicates compatibility with a range of versions.

cuFFTMp

NVSHMEM

11.0.14 (HPC-SDK 23.11)

== 2.10.1

11.0.5 (HPC-SDK 23.3)

== 2.8.0

10.8.1 (HPC-SDK 22.5+, 23.1)

>= 2.5.0, <= 2.6.0

0.0.2 (HPC-SDK 22.3)

== 2.4.1

In addition, note that cuFFTMp for CUDA 11 (resp. CUDA 12) requires NVSHMEM built for CUDA 11 (resp. CUDA 12).

HPC-SDK, cuFFTMp and NVSHMEM

As indicates above, the versions of cuFFTMp and NVSHMEM need to be compatible with each other.

Because of this, HPC-SDK may include multiple NVSHMEM version. For instance HPC-SDK 23.05 includes both 2.9 (the latest) and 2.8 (because cuFFTMp 11.0.5 requires NVSHMEM 2.8). In this case, care must be taken to ensure the correct NVSHMEM is used in conjunction with cuFFTMp.

The table below indicates, starting from HPC-SDK 23.03, the location of the appropriate NVSHMEM to use. $NVHPC_ROOT is the root of the HPC-SDK installation (e.g. /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/) and $CUDA is the CUDA version, in major.minor format (e.g. 12.0). $NVSHMEM_ROOT is the location of the appropriate NVSHMEM installation to use with cuFFTMp.

Note

cuFFTMp and NVSHMEM should both use the same $CUDA (major). Specifically, cuFFTMp for CUDA 12 (resp. 11) should be using NVSHMEM for CUDA 12 (resp. 11). The specific minor verion does not matter as NVSHMEM is compatible accross minor versions. This is relevant for the multi-CUDA bare metal installations (from https://developer.nvidia.com/hpc-sdk-downloads) or containers (e.g. nvcr.io/nvidia/nvhpc:23.11-devel-cuda_multi-centos7).

At compile-time, users should include the headers located in $NVSHMEM_ROOT/include. At runtime, $NVSHMEM_ROOT/lib should be present in LD_LIBRARY_PATH.

See also the HPC-SDK release notes.

HPC-SDK

cuFFTMp

NVSHMEM installation location ($NVSHMEM_ROOT)

23.03

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

23.05

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.07

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.09

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.11

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

24.01

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/