Release Notes¶
cuFFTMp 0.0.2 EA (HPC-SDK 22.3)¶
New features¶
Improved performances of
cufftXtSetDistributionand distributed descriptors. This effectively gives full support to Pencil data decompositions.Improved performances of the Reshape API.
Deprecations¶
N/A
Known / resolved issues¶
Single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 (e.g. a transform of size 2x2x16384) will lead to incorrect results when using built-in Slab decompositions (i.e.
CUFFT_XT_FORMAT_INPLACEandCUFFT_XT_FORMAT_INPLACE_SHUFFLED). This will be fixed in the future release of cuFFTMp.cufftXtSetDistributioncan be used as a workaround.
Standalone EA (November 2021)¶
New features¶
New multi-process API interoperable with MPI.
Built-in Slab decompositions (using
CUFFT_XT_FORMAT_INPLACEandCUFFT_XT_FORMAT_INPLACE_SHUFFLEDdescriptors) usingcufftMpAttachCommCustom data decomposition (using
CUFFT_XT_FORMAT_DISTRIBUTED_INPUTandCUFFT_XT_FORMAT_DISTRIBUTED_OUTPUTdescriptors) usingcufftXtSetDistributionandcufftMpAttachCommcufftXtMalloc,cufftXtFreeandcufftXtMemcpyare fully compatible with the aboveStandalone distributed reshape API with
cufftReshapeHandleand associated APIs
In addition, the following limitations have been lifted
C2R/Z2D now support
CUFFT_XT_FORMAT_INPLACEin 3DR2C/D2Z now support
CUFFT_XT_FORMAT_INPLACE_SHUFFLEDin 3D
The following restrictions have been lifted for CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED
“Dimension must factor into primes less than or equal to 127”
“Maximum dimension size is 4096 for single precision”
“Maximum dimension size is 2048 for double precision”
The following restrictions have been lifted for R2C/D2Z/C2R/Z2D with CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED
“Fastest changing dimension size needs to be even”
Deprecations¶
N/A
Known / resolved issues¶
cufftXtMemcpywithCUFFT_COPY_DEVICE_TO_DEVICEwas returning wrong results for 2D and 3D transforms in all previous versions of cuFFT. This has been fixed.