Custom LTO Helper#

cuFFT Device API through Example#

Here we cover code snippets from 10_cufft_device_api_example/00_cufft_device_api_example, a cuFFTDx example similar to the one detailed in the Use Case I: Offline Kernel Generation section, which adds LTO support to an existing non-LTO project. The source and build changes required to add LTO support to 00_cufft_device_api_example are identical to the changes covered in 09_introduction_lto_example/00_introduction_lto_example. This example demonstrates how to build an example-specific helper for LTO database creation using the cuFFT Device API directly, as an alternative to using the LTO Helper.

Building the LTO database through the cuFFT Device API involves the following:

  1. Create a description handle for each FFT operation that should be included in the database. cufftDescriptionSetTraitInt64 is used to set the traits for a given description handle. A description is valid as long as cufftdx::is_complete_fft_execution<FFT>::value is true.

    cufftDescriptionHandle desc_handle;
    cufftDescriptionCreate(&desc_handle);
    cufftDescriptionSetTraitInt64(desc_handle, CUFFT_DESC_TRAIT_EXEC_OP,   static_cast<long long int>(CUFFT_DESC_BLOCK))
    cufftDescriptionSetTraitInt64(desc_handle, CUFFT_DESC_TRAIT_SIZE,      static_cast<long long int>(128));
    cufftDescriptionSetTraitInt64(desc_handle, CUFFT_DESC_TRAIT_DIRECTION, static_cast<long long int>(CUFFT_DESC_FORWARD))
    cufftDescriptionSetTraitInt64(desc_handle, CUFFT_DESC_TRAIT_SM,        static_cast<long long int>(750));
    
  2. Create a device handle from the description handle(s).

    cufftDeviceHandle device_handle;
    cufftDeviceCreate(&device_handle, 1, &desc_handle);
    

    Hint

    cuFFT exposes an API, cufftDeviceCheckDescription, to check whether a given description handle provided to the device handle is valid, supported, or unsupported:

    cufftDeviceCheckDescription(device_handle, desc_handle);
    
  3. Obtain the database header file from the device handle and write it to disk. The header file contains information that informs cuFFTDx which FFT descriptions have LTO support.

    size_t database_str_size = 0;
    cufftDeviceGetDatabaseStrSize(device_handle, &database_str_size);
    char* database_str = (char*)malloc(database_str_size * sizeof(char));
    cufftDeviceGetDatabaseStr(device_handle, database_str_size, database_str);
    
    // Write database_str to disk as "lto_database.hpp.inc" and free all memory
    // associated with database_str
    
  4. Obtain the database fatbin(s) and ltoir(s) from the device handle and write them to disk. The fatbin(s) and ltoir(s), when combined with the cuFFTDx library, can be used to build the device functions that compute FFTs described by the description handle(s).

    size_t num_codes = 0;
    cufftDeviceGetNumLTOIRs(device_handle, &count);
    size_t* code_sizes = (size_t*)malloc(num_codes * sizeof(size_t));
    cufftDeviceGetLTOIRSizes(device_handle, num_codes, code_sizes);
    
    cufftDeviceCodeType* code_types =
       (cufftDeviceCodeType*)malloc(num_codes * sizeof(cufftDeviceCodeType));
    char** code_ptrs = (char**)malloc(num_codes * sizeof(char*));
    for (size_t n = 0; n < num_codes; ++n) {
       code_ptrs[n] = (char*)malloc(code_sizes[n] * sizeof(char));
    }
    
    cufftDeviceGetLTOIRs(device_handle, num_codes, code_ptrs, code_types);
    
    // Write code_ptrs[n] to disk as database_N.fatbin/ltoir and free all memory
    // associated with code_ptrs/code_types
    
  5. Free the device handle.

    cufftDeviceDestroy(device_handle)
    
  6. Compile and run the database creation executable to produce and write the LTO database to disk.

    g++ -I<cufft_include_path> \
        -L<cufft_lib_path> \
        -I<cuda_include_path> \
        cufft_device_api_lto_helper.cpp -o cufft_device_api_lto_helper
    

    Once a database has been created, it can be included into and linked with an cuFFTDx LTO project using the steps detailed in Use Case I: Offline Kernel Generation.