Usage tips¶

Building against HPC SDK¶

HPC SDK 22.3 ships with both cuFFT and cuFFTMp. Both cannot be used simultaneously. However, since cuFFTMp is a superset of cuFFT, it can be used in place of cuFFT.

The cuFFT headers are located in .../math_libs/X.Y/include/ while the cuFFTMp headers are located in .../math_libs/X.Y/include/cufftmp/. When compiling an application against cuFFTMp, ensure that

The cuFFT headers are not included at compile time
Or the cuFFTMp headers are included before the cuFFT headers.

An application cannot link against both cuFFT (libcufft.so) and cuFFTMp (libcufftMp.so). This will lead to runtime errors.

Both those requirements are automatically satisfied when building using the nvc -cudalib=cufftmp flag.

Building and running on Summit¶

cuFFTMp requires CUDA 11.4. This can be achieved by ml cuda/11.4 nvhpc/X.Y spectrum-mpi/10.4.0.3-20210112.
cuFFTMp requires CUDA_VISIBLE_DEVICES to be identical on every process. This means the proper usage of jsrun to run cuFFTMp on two nodes with 6 processes (each with 1 GPU and 4 cores) per node is jsrun -n 2 -a 6 -c 24 -g 6 ....
Since CUDA-aware Spectrum-MPI is not compatible with CUDA 11.4, the CUDA-aware features of Spectrum-MPI cannot be used with cuFFTMp.