Supported functionalities¶
The following limitations apply to cuFFTMp:
If defined,
CUDA_VISIBLE_DEVICES
should be identical on all processes within a node.Callbacks are not supported.
Because NVSHMEM spawns a hidden thread to handle communications, each process should have exclusive access to at least 2 CPU cores.
Only 2D and 3D transforms are supported, with the following restrictions:
The first two dimensions have length greater than or equal to the number of GPUs
When using built-in data layouts (
CUFFT_XT_FORMAT_INPLACE
andCUFFT_XT_FORMAT_INPLACE_SHUFFLED
):
in 2D, R2C only supports an
CUFFT_XT_FORMAT_INPLACE
input;in 2D, C2R only supports an
CUFFT_XT_FORMAT_INPLACE_SHUFFLED
input;no strides are allowed;
only in-place data layouts are allowed. In particular, for R2C, the real dimension has to be padded to accommodate the complex elements in the output.
Using different MPI communicators (for different processes in
MPI_COMM_WORLD
) is allowed, but those MPI communicators cannot overlap: for a given process, one cannot use cuFFTMp with two distinct MPI communicators.The user cannot use NVSHMEM directly (by linking to it), but only through some functions re-exposed through cuFFT. See cuFFTMp and NVSHMEM.
Only NVSHMEM-allocated memory can be used for descriptors and workspace. In particular,
cudaMalloc
’ed memory cannot be used. Note that memory allocated usingcufftXtMalloc
is automatically NVSHMEM-allocated.