NVSHMEM and cuFFTMp

Usage

cuFFTMp is based on, and compatible with, NVSHMEM. In the following, assume NVSHMEM is installed in ${NVSHMEM_HOME}.

cuFFTMp APIs that accept void * memory buffers pointers (e.g. cufftExecC2C, cufftMpExecReshapeAsync, …) need to be passed memory buffers allocated using nvshmem_malloc and freed with nvshmem_free. Those APIs are available in the NVSHMEM headers, included using #include <nvshmem.h> with -I${NVSHMEM_HOME}/include added as a compiler flags.

User applications should link to the libnvshmem_host.so library at link time in any case, as well as libnvshmem_device.a if any NVSHMEM API is directly used. This can usually be done by passing the -L${NVSHMEM_HOME}/lib -lnvshmem_host -lnvshmem_device flags to the linker.

Finally, cuFFTMp requires all the NVSHMEM libraries to be available on the system at runtime, for instance by defining

export LD_LIBRARY_PATH="${NVSHMEM_HOME}/lib:$LD_LIBRARY_PATH"

in the environment prior to using cuFFTMp.

NVSHMEM initialization

Users wishing to use any NVSHMEM API should initialize NVSHMEM in their application prior to calling cuFFTMp routines.

cuFFTMp will automatically initialize NVSHMEM as needed when calling cufftMakePlan2d or cufftMakePlan3d, and finalize it when calling cufftDestroy.

However, initialization overhead will be reduced if NVSHMEM is initialized prior to calling any cuFFTMp API. In particular, if cuFFTMp plans are repeatedly created and destroyed in a loop, initializing NVSHMEM before the loop will minimize cuFFTMp planning time.

NVSHMEM memory buffer in cuFFTMp

As cuFFTMp either initializes NVSHMEM memory buffers internally or requires memory buffers allocated using NVSHMEM (nvshmem_malloc and nvshmem_free), the environment variables defined by NVSHMEM for memory management are relevant to cuFFTMp as well.

For detailed information, please refer to the NVSHMEM documentation on memory management and environment variables .

Below are some examples of using NVSHMEM environment variables to adjust the memory management in cuFFTMp:

  • The NVSHMEM symmetric heap can be allocated either statically (it is preallocated once at NVSHMEM initialization and never grows) or dynamically (it grows/shrinks as the program runs). Dynamic growth can be turned on/off using NVSHMEM_DISABLE_CUDA_VMM. Note that CUDA VMM feature requires CUDA Driver version to be greater than or equal to 11.3.

  • For dynamically allocated symmetric heap, the maximum heap size can be set via NVSHMEM_MAX_MEMORY_PER_GPU. By default, this is set to 128 GB, which is sufficient for most applications.

  • For statically allocated symmetric heap, the user can increase the amount of memory reserved for NVSHMEM by setting the NVSHMEM_SYMMETRIC_SIZE environment variable. By default, this is set to 1 GB. Applications that require larger allocations should adjust this to a larger value. (This is not required for dynamic heap allocation.)

E.g., setting to 10GB:

export NVSHMEM_SYMMETRIC_SIZE=10G

Compatibility

cuFFTMp requires a specific version of NVSHMEM to be installed on the system. == indicates that an exact match is required. >=, <= indicates compatibility with a range of versions.

cuFFTMp

NVSHMEM

11.2.6 (HPC-SDK 24.07)

>= 3.0.6

11.0.14 (HPC-SDK 23.11)

== 2.10.1

11.0.5 (HPC-SDK 23.3)

== 2.8.0

10.8.1 (HPC-SDK 22.5+, 23.1)

>= 2.5.0, <= 2.6.0

0.0.2 (HPC-SDK 22.3)

== 2.4.1

In addition, note that cuFFTMp for CUDA 11 (resp. CUDA 12) requires NVSHMEM built for CUDA 11 (resp. CUDA 12).

Note

Starting from cuFFTMp 11.2.6, NVSHMEM ABI backward compatibility between host and device libraries is supported within a major NVSHMEM version. This means cuFFTMp 11.2.6 is now compatible with 3.0.6 and future NVSHMEM 3.x releases. Updating NVSHMEM to a newer release will no longer require an update to cuFFTMp to a newer release as long as the updated NVSHMEM is within the same major version.

HPC-SDK, cuFFTMp and NVSHMEM

As indicates above, the versions of cuFFTMp and NVSHMEM need to be compatible with each other.

Because of this, HPC-SDK may include multiple NVSHMEM version. For instance HPC-SDK 23.05 includes both 2.9 (the latest) and 2.8 (because cuFFTMp 11.0.5 requires NVSHMEM 2.8). In this case, care must be taken to ensure the correct NVSHMEM is used in conjunction with cuFFTMp.

The table below indicates, starting from HPC-SDK 23.03, the location of the appropriate NVSHMEM to use. $NVHPC_ROOT is the root of the HPC-SDK installation (e.g. /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/) and $CUDA is the CUDA version, in major.minor format (e.g. 12.0). $NVSHMEM_ROOT is the location of the appropriate NVSHMEM installation to use with cuFFTMp.

Note

cuFFTMp and NVSHMEM should both use the same $CUDA (major). Specifically, cuFFTMp for CUDA 12 (resp. 11) should be using NVSHMEM for CUDA 12 (resp. 11). The specific minor verion does not matter as NVSHMEM is compatible accross minor versions. This is relevant for the multi-CUDA bare metal installations (from https://developer.nvidia.com/hpc-sdk-downloads) or containers (e.g. nvcr.io/nvidia/nvhpc:23.11-devel-cuda_multi-centos7).

At compile-time, users should include the headers located in $NVSHMEM_ROOT/include. At runtime, $NVSHMEM_ROOT/lib should be present in LD_LIBRARY_PATH.

See also the HPC-SDK release notes.

HPC-SDK

cuFFTMp

NVSHMEM installation location ($NVSHMEM_ROOT)

23.03

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

23.05

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.07

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.09

11.0.5

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

23.11

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

24.01

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

24.03

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/

24.05

11.0.14

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem_cufftmp_compat/

24.07

11.2.6

$NVHPC_ROOT/comm_libs/$CUDA/nvshmem/