LTO Helper#

Note

This tool is designed for the offline kernel generation use case. For the online kernel generation use case, please refer to (online) LTO Database Creation.

The LTO Helper tool is provided to simplify the process of creating LTO databases. It generates a LTO database for cuFFTDx operators based on descriptions provided in an input CSV file. The generated database can then be integrated into a cuFFTDx LTO project following the steps in Use Case I: Offline Kernel Generation.

The tool can be found at two locations:

  • the cuFFTDx example folder cufftdx/example/lto_helper in the cuFFTDx + cuFFT LTO EA package.

  • the CUDALibrarySamples repository.

It consists of the following files:

  • cufftdx_cufft_lto_helper.cpp: The main implementation of the helper.

  • For CMake users:
    • CMakeLists.txt: Build configuration for cufftdx_cufft_lto_helper.cpp.

    • lto_helper.cmake: CMake functions for building and running the helper during configuration phase.

The sections below list the requirements and explain how to compile, configure, and run the helper.

Requirements#

  • cuFFT: The cuFFT static library (libcufft_static.a) from the cuFFTDx + cuFFT LTO EA package.

  • CUDA Toolkit 12.8 or newer

  • CSV File: An input CSV file containing FFT configuration descriptions. See the “Setting up the CSV File” section below for the required format and fields.

  • (Optionally) CMake 3.24 or newer

Using cufftdx_cufft_lto_helper#

  • Build cufftdx_cufft_lto_helper. The helper can be built using either direct compilation or CMake:

    • Direct compilation

      g++ -I<cuda_include_path> \
          -I<cufft_include_path> \
          -L<cufft_lib_path> \
          -lcufft cufftdx_cufft_lto_helper.cpp -o cufftdx_cufft_lto_helper
      
    • For CMake users, one can use the CMakeLists.txt file provided by the LTO Helper.

      cd cufftdx/example/lto_helper
      mkdir build && cd build
      cmake -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_BUILD_TYPE=Release ..
      cmake --build .
      
  • Run cufftdx_cufft_lto_helper.

    ./cufftdx_cufft_lto_helper <output_dir> <csv_file> [--CUDA_ARCHITECTURES=XX;YY;...]
    
    Arguments:
    • <output_dir> (Required): Directory where the LTO database and artifacts will be generated

    • <csv_file> (Required): Path to the input CSV file containing FFT configurations

    • --CUDA_ARCHITECTURES=XX;YY;... (Optional): Semicolon-separated list of target CUDA compute capabilities (e.g., 70;80)

    For example, you can run the helper as follows:

    ./cufftdx_cufft_lto_helper output_folder input.csv --CUDA_ARCHITECTURES=70
    
  • Expected Output. After running the helper, the following files will be generated in the specified <output_dir> directory:

    • lto_database.hpp.inc: A C++ header file containing metadata describing LTO support of the FFT configurations.

    • *.fatbin: Fatbinaries containing LTOIRs.

    • *.ltoir: LTOIRs.

Setting Up the CSV File#

The input CSV file defines the FFT configurations to be included in the LTO database. Each row specifies a unique FFT configuration using the following fields:

Field

Description

Required / Optional

size

FFT size (e.g., 1024, 2048)

Required

exec_op

Execution type (Block, Thread)

Required

direction

FFT direction (fft_direction::forward, fft_direction::inverse)

Required(*)

precision

Precision (float, double)

Optional

type

FFT type (fft_type::c2c, fft_type::r2c, fft_type::c2r)

Optional

real_mode

Real mode (real_mode::normal, real_mode::folded)

Optional

elements_per_thread

Number of elements per thread

Optional

Note

The direction field is only required when type = fft_type::c2c. For other FFT types (r2c and c2r), the direction is implicitly determined by the type. For details about required fields for different FFT configurations, see the Is FFT-complete Execution Trait.

Note

The order of the fields in the CSV file is flexible. Empty fields are allowed and will use default values as specified in the Operator section of the cuFFTDx documentation.

Here is an example CSV file with three different FFT configurations:

size,direction,precision,type,real_mode,exec_op,elements_per_thread
4,fft_direction::forward,float,fft_type::c2c,real_mode::normal,Block,
1024,,,fft_type::c2c,real_mode::normal,Block,32
2048,,double,fft_type::r2c,real_mode::folded,Thread,16

LTO Database Creation at CMake Configuration Phase#

The LTO database must be generated during CMake’s configuration phase, as this is when we need to specify which LTOIRs files need to be linked with the user’s code.

To simplify this process, two helper functions are provided in lto_helper.cmake:

  1. build_cufft_lto_helper(): Builds the LTO Helper tool. Parameters:

    • SRC_DIR: Directory containing the helper source files.

    • OUTPUT_DIR: Directory where the helper executable will be built.

  2. run_cufft_lto_helper(): Runs the helper to generate LTO artifacts and creates an object library. Parameters:

    • SRC_DIR: Directory containing the helper source files.

    • OUTPUT_DIR: Directory where the LTO database will be generated.

    • OUTPUT_NAME: Base name for the generated library (creates ${OUTPUT_NAME}_lto_lib).

    • DESCS: Path to the input CSV file with FFT configurations.

    • ARCHITECTURES: Target CUDA architectures (comma-separated, e.g., 70,80)

Example

include(/path/to/lto_helper/lto_helper.cmake)

# Generate LTO database
run_cufft_lto_helper(
   SRC_DIR ${CMAKE_SOURCE_DIR}/lto_helper
   OUTPUT_DIR ${CMAKE_BINARY_DIR}/lto_helper
   OUTPUT_NAME my_fft
   DESCS ${CMAKE_SOURCE_DIR}/input.csv
   ARCHITECTURES 70,80
)

# Link the generated LTO library to your executable
add_executable(my_executable main.cpp)
target_link_libraries(my_executable PRIVATE my_fft_lto_lib)

# Include the LTO database header
target_include_directories(my_executable
   PRIVATE
      ${CMAKE_BINARY_DIR}/lto_helper/my_fft_artifacts
)

Note

The build_cufft_lto_helper() function is automatically called by run_cufft_lto_helper(), so you typically don’t need to call it directly.

Note

When cross-compiling, you can specify a custom toolchain file for building the LTO Helper using CUFFTDX_LTO_TOOLCHAIN_FILE. This is important because the LTO Helper needs to be built and run on the native machine, not the target machine.