Requirements and Functionality#

Requirements#

Note

For additional details about cuFFTDx requirements, please refer to the Requirements section of the official cuFFTDx documentation.

Hardware Requirements#

The cuFFTDx + cuFFT LTO EA package supports the following platforms and architectures:

CPU Architecture: x86_64 or aarch64.
Supported NVIDIA GPU Architectures: SM70 (Volta) to SM90 (Hopper), except SM72.

Software Requirements#

The cuFFTDx + cuFFT LTO EA package has the following software requirements:

CUDA Toolkit 12.8 or newer
- NVCC 12.8.90+
- (Optionally) NVRTC 12.8.93+
- (Optionally) nvJitLink 12.8.93+
Supported host compilers (C++17 required)
- GCC 8+
- Clang 10+
(Optionally) CMake (version 3.26 or greater)

Note

LTOIRs must be compiled with the CUDA compilers (NVCC / NVRTC) distributed as part of the same CUDA Toolkit as the linker (NVCC / nvJitLink) used, or an older compiler.

For example, if the code is compiled using NVCC 12.Y and linked with nvJitLink 12.X, one needs to ensure X >= Y.

For more details, please refer to the Compatibility section of the nvJitLink documentation.

Supported Functionality#

The cuFFTDx + cuFFT LTO EA package delivers improved performance while removing the need of workspace for over 1000 additional FFT sizes vs. the production version of cuFFTDx.

The package contains all functionalities from the cuFFTDx 1.3.1 release. A description of the supported functions as well as the range of FFT size limits can be found in the Supported Functionality section of the official cuFFTDx documentation.

Comprehensive List of Supported FFT Sizes#

The cuFFTDx + cuFFT LTO EA package supports the following FFT sizes without workspace:

Precision	Supported Sizes
`half`	Same sizes as cuFFTDx 1.3.1 release (LTOIR implementation not currently available)
`float`	2-64 (all sizes), plus specific sizes: 65, 68, 72, 76, 80, 81, 84, 88, 92, 96, 98, 100, … up to 32768 (See Single-precision FFT Sizes)
`double`	2-64 (all sizes), plus specific sizes: 68, 72, 75, 76, 80, 81, 84, 88, 92, 93, 96, 98, … up to 16384 (See Double-precision FFT Sizes)

Hint

The sizes listed above represent the complete set of supported sizes for complex-to-complex FFTs across all supported CUDA architectures. To verify if a specific FFT configuration has LTOIR implementation available for your target CUDA architecture, use the cuFFT API cufftDeviceCheckDescription.