Release Notes¶
cuFFTMp 11.0.5 EA (HPC-SDK 23.3)¶
New features¶
- cuFFTMp 11.0.5 integrates NVSHMEM 2.8 and supports both CUDA 11 and CUDA 12. A matching - libnvshmem_host.solibrary (with a matching NVSHMEM and CUDA version) should be available at runtime.
- Added support for the Hopper GPU architecture. 
- Added NVSHMEM interoperability support. Applications and libraries can now all use NVSHMEM and share resources, such as NVSHMEM-allocated buffers. This requires the application and all NVSHMEM-enabled libraries to dynamically link - libnvshmem_host.so.
- cuFFTMp can now be bootstrapped without an MPI communicator. See cufftMpAttachComm for more details. 
- The - cufftXtSetDistributionAPI was changed, see cufftXtSetDistribution.
- Added a new - cufftXtSetSubformatDefaultAPI to let users use cuFFTMp without cuFFT multi-GPU descriptors through the- cufftExecC2C,- cufftXtExecand similar APIs. See cufftXtSetSubformatDefault.
- Improved performance on single-node, 3D, complex-to-complex transforms. 
Deprecations¶
- N/A 
Known issue¶
- HPC-SDK 23.3 releases NVSHMEM 2.9, but cuFFTMP users should point to NVSHMEM 2.8 in the compatible folder at runtime. See Compatibility. 
Resolved issues¶
- cuFFTMp now supports the same GPU architectures as cuFFT for all single-process functionalities 
cuFFTMp 10.8.1 EA (HPC-SDK 23.1)¶
New features¶
- N/A 
Deprecations¶
- N/A 
Known / resolved issues¶
- HPC-SDK 23.1 releases NVSHMEM 2.8, but cuFFTMP users should point to NVSHMEM 2.6 in the compatible folder at runtime. See Compatibility. 
cuFFTMp 10.8.1 EA (HPC-SDK 22.5+)¶
cuFFTMp 10.8.1 integrates NVSHMEM 2.5.0 and fixes a few issues as indicated below.
New features¶
- N/A 
Deprecations¶
- N/A 
Known / resolved issues¶
- The issue with single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 producing incorrect results has been resolved. 
- cuFFTMp’s versioning has been corrected. Going forward, cuFFTMp will be versioned similarly to cuFFT. See Versioning. 
cuFFTMp 0.0.2 EA (HPC-SDK 22.3)¶
New features¶
- Improved performances of - cufftXtSetDistributionand distributed descriptors. This effectively gives full support to Pencil data decompositions.
- Improved performances of the Reshape API. 
Deprecations¶
N/A
Known / resolved issues¶
- Single-node, single-precision, 3D, complex-to-complex powers of 2 transforms in which Z > 8192 (e.g. a transform of size 2x2x16384) will lead to incorrect results when using built-in Slab decompositions (i.e. - CUFFT_XT_FORMAT_INPLACEand- CUFFT_XT_FORMAT_INPLACE_SHUFFLED). This will be fixed in the future release of cuFFTMp.- cufftXtSetDistributioncan be used as a workaround.
Standalone EA (November 2021)¶
New features¶
- New multi-process API interoperable with MPI. 
- Built-in Slab decompositions (using - CUFFT_XT_FORMAT_INPLACEand- CUFFT_XT_FORMAT_INPLACE_SHUFFLEDdescriptors) using- cufftMpAttachComm
- Custom data decomposition (using - CUFFT_XT_FORMAT_DISTRIBUTED_INPUTand- CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUTdescriptors) using- cufftXtSetDistributionand- cufftMpAttachComm
- cufftXtMalloc,- cufftXtFreeand- cufftXtMemcpyare fully compatible with the above
- Standalone distributed reshape API with - cufftReshapeHandleand associated APIs
In addition, the following limitations have been lifted
- C2R/Z2D now support - CUFFT_XT_FORMAT_INPLACEin 3D
- R2C/D2Z now support - CUFFT_XT_FORMAT_INPLACE_SHUFFLEDin 3D
The following restrictions have been lifted for CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED
- “Dimension must factor into primes less than or equal to 127” 
- “Maximum dimension size is 4096 for single precision” 
- “Maximum dimension size is 2048 for double precision” 
The following restrictions have been lifted for R2C/D2Z/C2R/Z2D with CUFFT_XT_FORMAT_INPLACE and CUFFT_XT_FORMAT_INPLACE_SHUFFLED
- “Fastest changing dimension size needs to be even” 
Deprecations¶
N/A
Known / resolved issues¶
- cufftXtMemcpywith- CUFFT_COPY_DEVICE_TO_DEVICEwas returning wrong results for 2D and 3D transforms in all previous versions of cuFFT. This has been fixed.