Usage tips

Building against HPC SDK

HPC SDK 22.3 ships with both cuFFT and cuFFTMp. Both cannot be used simultaneously. However, since cuFFTMp is a superset of cuFFT, it can be used in place of cuFFT.

The cuFFT headers are located in .../math_libs/X.Y/include/ while the cuFFTMp headers are located in .../math_libs/X.Y/include/cufftmp/. When compiling an application against cuFFTMp, ensure that

  • The cuFFT headers are not included at compile time

  • Or the cuFFTMp headers are included before the cuFFT headers.

An application cannot link against both cuFFT ( and cuFFTMp ( This will lead to runtime errors.

Both those requirements are automatically satisfied when building using the nvc -cudalib=cufftmp flag.

Building and running on Summit

  • cuFFTMp requires CUDA 11.4. This can be achieved by ml cuda/11.4 nvhpc/X.Y spectrum-mpi/

  • cuFFTMp requires CUDA_VISIBLE_DEVICES to be identical on every process. This means the proper usage of jsrun to run cuFFTMp on two nodes with 6 processes (each with 1 GPU and 4 cores) per node is jsrun -n 2 -a 6 -c 24 -g 6 ....

  • Since CUDA-aware Spectrum-MPI is not compatible with CUDA 11.4, the CUDA-aware features of Spectrum-MPI cannot be used with cuFFTMp.