Release Notes

cuFFTMp 10.8.1 EA (HPC-SDK 23.1)

New features

  • N/A

Deprecations

  • N/A

Known / resolved issues

  • HPC-SDK 23.1 releases NVSHMEM 2.8, but cuFFTMP users should point to NVSHMEM 2.6 in the compatible folder at runtime. See NVSHMEM and cuFFTMp.

cuFFTMp 10.8.1 EA (HPC-SDK 22.5+)

cuFFTMp 10.8.1 integrates NVSHMEM 2.5.0 and fixes a few issues as indicated below.

New features

  • N/A

Deprecations

  • N/A

Known / resolved issues

  • The issue with single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 producing incorrect results has been resolved.

  • cuFFTMp’s versioning has been corrected. Going forward, cuFFTMp will be versioned similarly to cuFFT. See Versioning.

cuFFTMp 0.0.2 EA (HPC-SDK 22.3)

New features

  • Improved performances of cufftXtSetDistribution and distributed descriptors. This effectively gives full support to Pencil data decompositions.

  • Improved performances of the Reshape API.

Deprecations

N/A

Known / resolved issues

  • Single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 (e.g. a transform of size 2x2x16384) will lead to incorrect results when using built-in Slab decompositions (i.e. CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED). This will be fixed in the future release of cuFFTMp. cufftXtSetDistribution can be used as a workaround.

Standalone EA (November 2021)

New features

  • New multi-process API interoperable with MPI.

  • Built-in Slab decompositions (using CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED descriptors) using cufftMpAttachComm

  • Custom data decomposition (using CUFFT_XT_FORMAT_DISTRIBUTED_INPUT and CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT descriptors) using cufftXtSetDistribution and cufftMpAttachComm

  • cufftXtMalloc, cufftXtFree and cufftXtMemcpy are fully compatible with the above

  • Standalone distributed reshape API with cufftReshapeHandle and associated APIs

In addition, the following limitations have been lifted

  • C2R/Z2D now support CUFFT_XT_FORMAT_INPLACE in 3D

  • R2C/D2Z now support CUFFT_XT_FORMAT_INPLACE_SHUFFLED in 3D

The following restrictions have been lifted for CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED

  • “Dimension must factor into primes less than or equal to 127”

  • “Maximum dimension size is 4096 for single precision”

  • “Maximum dimension size is 2048 for double precision”

The following restrictions have been lifted for R2C/D2Z/C2R/Z2D with CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED

  • “Fastest changing dimension size needs to be even”

Deprecations

N/A

Known / resolved issues

  • cufftXtMemcpy with CUFFT_COPY_DEVICE_TO_DEVICE was returning wrong results for 2D and 3D transforms in all previous versions of cuFFT. This has been fixed.