Release Notes¶
This section includes significant changes, new features, performance improvements, and various issues. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).
1.1.1¶
cuFFTDx patch update to accommodate cuBLASDx EA 0.1.0 release.
New Features¶
Added 2D and 3D FFT examples.
cuFFTDx now depends on commonDx headers. commonDx includes private tools and types that all Dx libraries use.
Resolved Issues¶
Disabled runtime CUDA block dimensions assertions (using
assert()
) inexecute()
methods by default. In previous versions they are disabled only whenNDEBUG
(defined by CMake inRelease
mode) orCUFFTDX_DISABLE_RUNTIME_ASSERTS
are defined, or when compilation is done by NVRTC. Keeping assertions could result in performance penalty. Now, user has to defineCUFFTDX_ENABLE_RUNTIME_ASSERTS
to enable them.
Known Issues¶
Since CUDA Toolkit 12.2 NVCC compiler in certain situation reports incorrect compilation error when
value_type
type of FFT description type is used. The problematic code with possible workarounds are presented below:// Any FFT description type using FFT = decltype(Size<128>() + Precision<float>() + Type<fft_type::r2c>() + Block() + SM<700>()); using complex_type = typename FFT::value_type; // compilation error // Workaround #1 using complex_type = typename decltype(FFT())::value_type; // Workaround #2 (used in cuFFTDx examples) template <typename T> using value_type_t = typename T::value_type; using complex_type = value_type_t<FFT>;
1.1.0¶
The first release of cuFFTDx library with support for Hopper and Ada architectures.
New Features¶
Initial support for Orin architecture (SM87).
Initial support for Ada architecture (SM89).
Initial support for Hopper architecture (SM90).
Added cufftdx::is_supported.
Added preliminary support for MSVC.
Improvements to the documentation:
Examples chapter,
Quick Installation Guide chapter,
information about shared memory usage in cuFFTDx, and
updated introduction chapters: First FFT Using cuFFTDx and Your Next Custom FFT Kernels.
Known Issues¶
Compiling using MSVC as CUDA host compiler requires enabling
__cplusplus
(/Zc:__cplusplus). In order to do so, pass-Xcompiler "/Zc:__cplusplus"
as an option to NVCC (NVCC: Options for Passing Specific Phase Options).When compiling using MSVC as CUDA host compiler please be aware of the limit on the length of mangled names and other compiler limits which in extreme cases can result in calling incorrect instances of kernel templates involving cuFFTDx.
1.0.0¶
The first general availability (GA) release of cuFFTDx library.
New Features¶
Added new shared API for block FFT execution, see block execution methods.
Added and documented FFT::stride.
Optimized default ElementsPerThread and FFTsPerBlock values for SM80 (targeting A100) and SM70 (targeting V100).
Restored full performance of powers-of-two kernels in cuFFTDx.
Resolved Issues¶
ptxas
warningprogram uses 32-bit address on line XXX which is conflicting with .address_size 64
shouldn’t appear anymore.
0.3.1¶
The last early access (EA) release of cuFFTDx library.
Known Issues¶
ptxas
warning about pointer size conflict:ptxas warning : Program uses 32-bit address on line 'XXX' which is conflicting with .address_size 64
This warning may appear when compiling, but it does not impact functionality or performance.