Release Notes

cuFFTMp 0.0.2 EA (HPC-SDK 22.3)

New features

  • Improved performances of cufftXtSetDistribution and distributed descriptors. This effectively gives full support to Pencil data decompositions.

  • Improved performances of the Reshape API.



Known / resolved issues

  • Single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 (e.g. a transform of size 2x2x16384) will lead to incorrect results when using built-in Slab decompositions (i.e. CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED). This will be fixed in the future release of cuFFTMp. cufftXtSetDistribution can be used as a workaround.

Standalone EA (November 2021)

New features

  • New multi-process API interoperable with MPI.

  • Built-in Slab decompositions (using CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED descriptors) using cufftMpAttachComm

  • Custom data decomposition (using CUFFT_XT_FORMAT_DISTRIBUTED_INPUT and CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT descriptors) using cufftXtSetDistribution and cufftMpAttachComm

  • cufftXtMalloc, cufftXtFree and cufftXtMemcpy are fully compatible with the above

  • Standalone distributed reshape API with cufftReshapeHandle and associated APIs

In addition, the following limitations have been lifted

  • C2R/Z2D now support CUFFT_XT_FORMAT_INPLACE in 3D


The following restrictions have been lifted for CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED

  • “Dimension must factor into primes less than or equal to 127”

  • “Maximum dimension size is 4096 for single precision”

  • “Maximum dimension size is 2048 for double precision”

The following restrictions have been lifted for R2C/D2Z/C2R/Z2D with CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED

  • “Fastest changing dimension size needs to be even”



Known / resolved issues

  • cufftXtMemcpy with CUFFT_COPY_DEVICE_TO_DEVICE was returning wrong results for 2D and 3D transforms in all previous versions of cuFFT. This has been fixed.