LTO Examples#
The cuFFTDx + cuFFT LTO EA package provides multiple samples which demonstrate how to use the features enabled by LTO.
Examples |
|||
|---|---|---|---|
Group |
Example |
Description |
|
Subgroup |
|||
Introduction Examples |
09_introduction_lto_example |
00_introduction_lto_example |
(offline) cuFFTDx LTO introduction |
04_nvrtc_fft |
03_nvrtc_fft_block_lto |
(online) cuFFTDx LTO introduction |
|
10_cufft_device_api_example |
00_cufft_device_api_lto_example |
(offline) cuFFT Device API introduction |
|
Simple FFT Examples |
01_simple_fft_thread |
02_simple_fft_thread_lto |
(offline) Complex-to-complex (C2C) thread FFT using LTO |
02_simple_fft_block_lto |
10_simple_fft_block_c2r_lto |
(offline) Complex-to-real block FFT using LTO |
|
NVRTC Examples (additional) |
02_nvrtc_fft_thread_lto |
(online) Complex-to-complex thread FFT using LTO |
|
FFT Performance |
02_block_fft_lto_ptx_performance |
(offline) Benchmark for C2C block FFT (LTO vs PTX) |
|
For more information about “online” vs “offline” annotated in the table above, see Augment cuFFTDx with LTO.
Introduction Examples#
00_introduction_lto_example03_nvrtc_fft_block_lto00_cufft_device_api_lto_example
Examples used in the documentation demonstrating how to adopt LTO features for existing cuFFTDx projects.
00_introduction_lto_example and 03_nvrtc_fft_block_lto are used in the Augment cuFFTDx with LTO section
to showcase the steps for offline and online kernel generation, respectively.
In 00_cufft_device_api_lto_example, used in the Custom LTO Helper section, we demonstrate how to
write a custom LTO database helper tool using the cuFFT Device API, as an alternative to the LTO Helper.
Simple FFT Examples#
02_simple_fft_thread_lto10_simple_fft_block_c2r_lto
The LTO version of the 02_simple_fft_thread and 10_simple_fft_block_c2r samples, respectively.
See Simple FFT Examples from the official cuFFTDx documentation for more details.
NVRTC Examples#
02_nvrtc_fft_thread_lto03_nvrtc_fft_block_lto
The LTO versions of the existing NVRTC examples presenting how to use cuFFTDx on thread and block level with NVRTC and nvJitLink. See NVRTC Examples from the official cuFFTDx documentation for more details.
FFT Performance#
02_block_fft_lto_ptx_performance
The example compares the performance of cuFFTDx device functions calculating FFT using the inlined-PTX implementation vs. the LTOIR implementation.
Fig. 1 Performance comparison between LTOIR and PTX implementations of a single-precision complex-to-complex forward FFT. Tests were performed on H100 80GB with maximum clocks set.#