LTO Examples#
The cuFFTDx + cuFFT LTO EA package provides multiple samples which demonstrate how to use the features enabled by LTO.
Examples |
|||
---|---|---|---|
Group |
Example |
Description |
|
Subgroup |
|||
Introduction Examples |
09_introduction_lto_example |
00_introduction_lto_example |
(offline) cuFFTDx LTO introduction |
04_nvrtc_fft |
03_nvrtc_fft_block_lto |
(online) cuFFTDx LTO introduction |
|
10_cufft_device_api_example |
00_cufft_device_api_lto_example |
(offline) cuFFT Device API introduction |
|
Simple FFT Examples |
01_simple_fft_thread |
02_simple_fft_thread_lto |
(offline) Complex-to-complex (C2C) thread FFT using LTO |
02_simple_fft_block_lto |
10_simple_fft_block_c2r_lto |
(offline) Complex-to-real block FFT using LTO |
|
NVRTC Examples (additional) |
02_nvrtc_fft_thread_lto |
(online) Complex-to-complex thread FFT using LTO |
|
FFT Performance |
02_block_fft_lto_ptx_performance |
(offline) Benchmark for C2C block FFT (LTO vs PTX) |
For more information about “online” vs “offline” annotated in the table above, see Augment cuFFTDx with LTO.
Introduction Examples#
00_introduction_lto_example
03_nvrtc_fft_block_lto
00_cufft_device_api_lto_example
Examples used in the documentation demonstrating how to adopt LTO features for existing cuFFTDx projects.
00_introduction_lto_example
and 03_nvrtc_fft_block_lto
are used in the Augment cuFFTDx with LTO section
to showcase the steps for offline and online kernel generation, respectively.
In 00_cufft_device_api_lto_example
, used in the Custom LTO Helper section, we demonstrate how to
write a custom LTO database helper tool using the cuFFT Device API, as an alternative to the LTO Helper.
Simple FFT Examples#
02_simple_fft_thread_lto
10_simple_fft_block_c2r_lto
The LTO version of the 02_simple_fft_thread
and 10_simple_fft_block_c2r
samples, respectively.
See Simple FFT Examples from the official cuFFTDx documentation for more details.
NVRTC Examples#
02_nvrtc_fft_thread_lto
03_nvrtc_fft_block_lto
The LTO versions of the existing NVRTC examples presenting how to use cuFFTDx on thread and block level with NVRTC and nvJitLink. See NVRTC Examples from the official cuFFTDx documentation for more details.
FFT Performance#
02_block_fft_lto_ptx_performance
The example compares the performance of cuFFTDx device functions calculating FFT using the inlined-PTX implementation vs. the LTOIR implementation.
Fig. 1 Performance comparison between LTOIR and PTX implementations of a single-precision complex-to-complex forward FFT. Tests were performed on H100 80GB with maximum clocks set.#