Using cuFFTDx with cuFFT LTO#

Starting from cuFFTDx 1.6.0, cuFFTDx can be extended with additional features and performance by reusing optimized code from cuFFT. The new cuFFT device API allows users to get blobs of device code from cuFFT. These blobs can be linked (offline, or at runtime) with existing cuFFTDx kernels in order to achieve better performance for certain FFT cases. This feature currently only supports adding sizes for improved performance and to remove the workspace requirement, but we have plans to extend these features in the future. Stay tuned!

Highlights#

A new way of enhancing your cuFFTDx project via our cuFFT host library.
Over 1000 additional sizes supported with improved performance and without workspace requirements, via code sharing across our libraries enabled by LTO.
Supporting both offline builds (NVCC) and runtime builds (NVRTC / nvJitLink).
Additional link time optimization in cuFFTDx applications.

Note

This feature is experimental and is subject to change. Unlike the cuFFTDx + cuFFT LTO EA, which was delivered as a separate package, it is now included directly in the cuFFTDx production release. Its purpose is to let users explore LTO-augmented cuFFTDx functionality and share feedback, helping us refine the experience before it becomes fully supported in both cuFFTDx and cuFFT. While we are working toward making this feature production-ready, occasional issues may still be encountered.

Please direct any feedback you might have to Melody Shih <melodys@nvidia.com>, or Miguel Ferrer Avila <mferreravila@nvidia.com>.