LTO Helper#
Note
This tool is designed for the offline kernel generation use case. For the online kernel generation use case, please refer to (online) LTO Database Creation.
The LTO Helper tool is provided to simplify the process of creating LTO databases. It generates a LTO database for cuFFTDx operators based on descriptions provided in an input CSV file. The generated database can then be integrated into a cuFFTDx LTO project following the steps in Use Case I: Offline Kernel Generation.
The tool can be found at two locations:
the cuFFTDx example folder
cufftdx/example/lto_helper
in the cuFFTDx + cuFFT LTO EA package.the CUDALibrarySamples repository.
It consists of the following files:
cufftdx_cufft_lto_helper.cpp
: The main implementation of the helper.- For CMake users:
CMakeLists.txt
: Build configuration forcufftdx_cufft_lto_helper.cpp
.lto_helper.cmake
: CMake functions for building and running the helper during configuration phase.
The sections below list the requirements and explain how to compile, configure, and run the helper.
Requirements#
cuFFT: The cuFFT static library (
libcufft_static.a
) from the cuFFTDx + cuFFT LTO EA package.CUDA Toolkit 12.8 or newer
CSV File: An input CSV file containing FFT configuration descriptions. See the “Setting up the CSV File” section below for the required format and fields.
(Optionally) CMake 3.24 or newer
Using cufftdx_cufft_lto_helper
#
Build cufftdx_cufft_lto_helper. The helper can be built using either direct compilation or CMake:
Direct compilation
g++ -I<cuda_include_path> \ -I<cufft_include_path> \ -L<cufft_lib_path> \ -lcufft cufftdx_cufft_lto_helper.cpp -o cufftdx_cufft_lto_helper
For CMake users, one can use the
CMakeLists.txt
file provided by the LTO Helper.cd cufftdx/example/lto_helper mkdir build && cd build cmake -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_BUILD_TYPE=Release .. cmake --build .
Run cufftdx_cufft_lto_helper.
./cufftdx_cufft_lto_helper <output_dir> <csv_file> [--CUDA_ARCHITECTURES=XX;YY;...]
- Arguments:
<output_dir>
(Required): Directory where the LTO database and artifacts will be generated<csv_file>
(Required): Path to the input CSV file containing FFT configurations--CUDA_ARCHITECTURES=XX;YY;...
(Optional): Semicolon-separated list of target CUDA compute capabilities (e.g.,70;80
)
For example, you can run the helper as follows:
./cufftdx_cufft_lto_helper output_folder input.csv --CUDA_ARCHITECTURES=70
Expected Output. After running the helper, the following files will be generated in the specified
<output_dir>
directory:lto_database.hpp.inc
: A C++ header file containing metadata describing LTO support of the FFT configurations.*.fatbin
: Fatbinaries containing LTOIRs.*.ltoir
: LTOIRs.
Setting Up the CSV File#
The input CSV file defines the FFT configurations to be included in the LTO database. Each row specifies a unique FFT configuration using the following fields:
Field |
Description |
Required / Optional |
|
FFT size (e.g., 1024, 2048) |
Required |
|
Execution type ( |
Required |
|
FFT direction ( |
Required(*) |
|
Precision ( |
Optional |
|
FFT type ( |
Optional |
|
Real mode ( |
Optional |
|
Number of elements per thread |
Optional |
Note
The direction
field is only required when type = fft_type::c2c
. For other FFT types
(r2c
and c2r
), the direction is implicitly determined by the type. For details about
required fields for different FFT configurations, see the
Is FFT-complete Execution Trait.
Note
The order of the fields in the CSV file is flexible. Empty fields are allowed and will use default values as specified in the Operator section of the cuFFTDx documentation.
Here is an example CSV file with three different FFT configurations:
size,direction,precision,type,real_mode,exec_op,elements_per_thread
4,fft_direction::forward,float,fft_type::c2c,real_mode::normal,Block,
1024,,,fft_type::c2c,real_mode::normal,Block,32
2048,,double,fft_type::r2c,real_mode::folded,Thread,16
LTO Database Creation at CMake Configuration Phase#
The LTO database must be generated during CMake’s configuration phase, as this is when we need to specify which LTOIRs files need to be linked with the user’s code.
To simplify this process, two helper functions are provided in lto_helper.cmake
:
build_cufft_lto_helper()
: Builds the LTO Helper tool. Parameters:SRC_DIR
: Directory containing the helper source files.OUTPUT_DIR
: Directory where the helper executable will be built.
run_cufft_lto_helper()
: Runs the helper to generate LTO artifacts and creates an object library. Parameters:SRC_DIR
: Directory containing the helper source files.OUTPUT_DIR
: Directory where the LTO database will be generated.OUTPUT_NAME
: Base name for the generated library (creates${OUTPUT_NAME}_lto_lib
).DESCS
: Path to the input CSV file with FFT configurations.ARCHITECTURES
: Target CUDA architectures (comma-separated, e.g.,70,80
)
Example
include(/path/to/lto_helper/lto_helper.cmake)
# Generate LTO database
run_cufft_lto_helper(
SRC_DIR ${CMAKE_SOURCE_DIR}/lto_helper
OUTPUT_DIR ${CMAKE_BINARY_DIR}/lto_helper
OUTPUT_NAME my_fft
DESCS ${CMAKE_SOURCE_DIR}/input.csv
ARCHITECTURES 70,80
)
# Link the generated LTO library to your executable
add_executable(my_executable main.cpp)
target_link_libraries(my_executable PRIVATE my_fft_lto_lib)
# Include the LTO database header
target_include_directories(my_executable
PRIVATE
${CMAKE_BINARY_DIR}/lto_helper/my_fft_artifacts
)
Note
The build_cufft_lto_helper()
function is automatically called by run_cufft_lto_helper()
,
so you typically don’t need to call it directly.
Note
When cross-compiling, you can specify a custom toolchain file for building the LTO Helper using
CUFFTDX_LTO_TOOLCHAIN_FILE
. This is important because the LTO Helper needs to be built and run
on the native machine, not the target machine.