cuFFTMp is based on, and compatible with, NVSHMEM. In the following, assume NVSHMEM is installed in ${NVSHMEM_HOME}.

cuFFTMp APIs that accept void * memory buffers pointers (e.g. cufftExecC2C, cufftMpExecReshapeAsync, …) need to be passed memory buffers allocated using nvshmem_malloc and freed with nvshmem_free. Those APIs are available in the NVSHMEM headers, included using #include <nvshmem.h> with -I${NVSHMEM_HOME}/include added as a compiler flags.

User applications should link to the libnvshmem_host.so library at link time in any case, as well as libnvshmem_device.a if any NVSHMEM API is directly used. This can usually be done by passing the -L${NVSHMEM_HOME}/lib -lnvshmem_host -lnvshmem_device flags to the linker.

Finally, cuFFTMp requires all the NVSHMEM libraries to be available on the system at runtime, for instance by defining


in the environment prior to using cuFFTMp.

NVSHMEM initialization

Users wishing to use any NVSHMEM API should initialize NVSHMEM in their application prior to calling cuFFTMp routines.

cuFFTMp will automatically initialize NVSHMEM as needed when calling cufftMakePlan2d or cufftMakePlan3d, and finalize it when calling cufftDestroy.

However, initialization overhead will be reduced if NVSHMEM is initialized prior to calling any cuFFTMp API. In particular, if cuFFTMp plans are repeatedly created and destroyed in a loop, initializing NVSHMEM before the loop will minimize cuFFTMp planning time.


cuFFTMp requires a specific version of NVSHMEM to be installed on the system. == indicates that an exact match is required. >=, <= indicates compatibility with a range of versions.



11.0.5 (HPC-SDK 23.3)

== 2.8.0

10.8.1 (HPC-SDK 22.5+, 23.1)

>= 2.5.0, <= 2.6.0

0.0.2 (HPC-SDK 22.3)

== 2.4.1

In addition, note that cuFFTMp for CUDA 11 (resp. CUDA 12) requires NVSHMEM built for CUDA 11 (resp. CUDA 12).


The versions of cuFFTMp and NVSHMEM need to be compatible with each other. HPC-SDK 23.3 includes both NVSHMEM 2.9 and 2.8. However cuFFTMp 11.0.5 is only compatible with NVSHMEM 2.8. Therefore cuFFTMP users should point to NVSHMEM 2.8 using export LD_LIBRARY_PATH="$HPCSDK_HOME/math_libs/11.8/lib64/compat/nvshmem_2.8.0-3/:$LD_LIBRARY_PATH (for CUDA 11) or export LD_LIBRARY_PATH="$HPCSDK_HOME/math_libs/12.0/lib64/compat/nvshmem_2.8.0-3/:$LD_LIBRARY_PATH (for CUDA 12) in the environment.