Notable differences with the single-process, multi-GPU API¶
The following are a few notable differences between the single-process, multi-GPU cuFFT and cuFFTMp in terms of requirements and API usage.
Copies the entire array from the CPU to multiple GPUs (in natural or permuted order)
Copies the local, distributed array from the CPU to the GPU without redistributing.
Copies the entire array from multiple GPUs to the CPU (always in natural order)
Copies the local, distributed array from the GPU to the CPU without redistributing.
Redistribute data between GPUs to/from natural order to/from permuted order
Redistribute data between GPUs to/from natural order to/from permuted order (not allowed with
desca pointer to a
desc->descriptor-> nGPUs == nGPUs
desc->descriptor-> nGPUs == 1
Minimum size (with
32 in every dimension
nGPUsin the first two dimensions, 2 in the last dimension in 3D.
Maximum number of GPUs
Supported (but individual batches are not distributed across GPUs)