Quick Installation Guide#

The cuFFTDx + cuFFT LTO EA package is distributed as a joint package containing the cuFFTDx 1.4 and cuFFT 11.5 libraries. Both libraries are designed to work with each other. This section covers the package content, structure, and instructions for using and building with the cuFFTDx + cuFFT LTO EA package.

Note

For general cuFFTDx installation instructions, please refer to the Installation section of the official cuFFTDx documentation.

The cuFFTDx + cuFFT LTO EA package includes two main components:

cuFFTDx 1.4: A header-only library that can be used by including the directory with cufftdx.hpp and the commonDx into the users’ compilation commands.
cuFFT 11.5: A host library that provides an API to generate LTO databases to be consumed by cuFFTDx.

Once unpacked, the directory structure will look like:

nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/
├── cufft
│   ├── include
│   │   ├── cufft_device.h
│   │   ├── cufft.h
│   │   └── ... etc
│   └── lib
│       ├── cmake
│       ├── libcufft.so
│       ├── libcufft_static.a
│       └── ... etc
└── cufftdx
    ├── doc
    │   └── cufftdx
    ├── example
    │   └── cufftdx
    ├── include
    │   ├── commondx
    │   ├── cufftdx
    │   ├── cufftdx.hpp
    │   └── ... etc
    ├── lib
    │   └── cmake
    └── src
        └── cufftdx

cuFFTDx with LTO in your Project#

The instructions for how to use cuFFTDx remain almost the same. The only difference here is that cuFFTDx is not distributed as part of the MathDx package. So, rather than using <mathdx_include_dir> to include the directory with cufftdx.hpp and commonDx, one should use <cufftdx_include_dir> (i.e., nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufftdx/include/):

nvcc -std=c++17 -arch sm_XY (...) \
     -I<cufftdx_include_dir> \
     <your_source_file>.cu -o <your_binary>

Additional requirements are listed in the Requirements section.

Hint

The example directory contains a Makefile demonstrating how to build the provided sample applications. You can review it and adapt it to your project’s needs.

Using cuFFT LTO Database in cuFFTDx#

Use Case I: Offline Kernel Generation#

To enhance your cuFFTDx library via cuFFT-generated code, additional compilation steps are required.

Note

For the offline kernel generation use case, cuFFT is only needed to generate the initial LTO database files. Once generated, you can build and run your application without cuFFT.

Create the LTO database using cuFFT:

Using either
- the provided LTO Helper utility (see Use Case I: Offline Kernel Generation for illustration), or
- the cuFFT device API (see cuFFT Device API through Example for illustration)
for the LTO database generation. The output includes a C++ header file (lto_database.hpp.inc) as well as .ltoir and .fatbin files.

Include the generated C++ header file and compile cuFFTDx application code into LTOIRs:

nvcc -std=c++17 \
     -dc \
     --generate-code arch=compute_75,code=lto_75 \
     -I<cufftdx_include_dir> \
     -I/path/to/lto_database.hpp.inc \
     <your_source_file>.cu -o <your_source_file>.o

Replace 75 with the target CUDA architecture.

Device link with LTOIRs:

nvcc -dlink \
     -dlto \
     --generate-code arch=compute_75,code=sm_75 \
     database_X.fatbin ... \
     database_X.ltoir ... \
     <your_source_file>.o -o <your_source_file>_dlink.o

Host link the object files to produce the final executable:

g++ -L${CUDA_PATH}/lib64 -lcudart \
    <your_source_file>.o <your_source_file>_dlink.o -o <your_binary>

Note

The lto_database.hpp.inc, .fatbin and .ltoir files must be generated with descriptions that are consistent with the traits of the cuFFTDx operators in your project and the target architectures.

If there is a mismatch, cuFFTDx will fall back to using the PTX implementation instead of the LTOIR implementation, causing this assertion to fail:

static_assert(FFT::code == cufftdx::experimental::code_type::ltoir, "Selected implementation code type is not LTOIR.");

Use Case II: Online Kernel Generation#

When building the NVRTC codes, one needs take into account the following:

Link with the two extra dependencies, cuFFT and nvJitLink
Define the CUFFTDX_ENABLE_CUFFT_DEPENDENCY macro for using the (online) LTO Database Creation helper function.

g++ -DCUFFTDX_ENABLE_CUFFT_DEPENDENCY \
    -L<cufft_lib_path> -I<cufft_include_path> \
    -I<cufftdx_include_path> \
    -lcufft \
    -lnvJitLink \
    -lnvrtc \
    -L<cuda_lib_path> -lcuda <your_source_file>.cu -o <your_binary>

cuFFTDx with LTO in your CMake Project#

Hint

The example directory contains a CMakeLists.txt demonstrating how to compile the provided sample applications. You can review it and adapt it to your project’s needs.

The cuFFTDx + cuFFT LTO EA package provides configuration files that simplify its integration into a CMake project. To include cuFFTDx 1.4 in your code:

Finding cufftdx using find_package:

find_package(cufftdx REQUIRED CONFIG)

You can pass the path to cuFFTDx package using PATHS option:

find_package(cufftdx REQUIRED CONFIG PATHS "<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/")

Alternatively, you can set cufftdx_ROOT during cmake configuration of your project:

cmake -Dcufftdx_ROOT="<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufftdx/" (...)

Link the cufftdx::cufftdx target to your program:

target_link_libraries(YourProgram PRIVATE cufftdx::cufftdx)

Incorporate the steps from Using cuFFT LTO Database in cuFFTDx using CMake:

For the offline kernel generation use case:

Run the run_cufft_lto_helper CMake function provided in the lto_helper.cmake script in the example/lto_helper directory to generate the LTO database.
- After running, the ${OUTPUT_NAME}_lto_lib library will be available to be linked with your program.
Set both CUDA_SEPARABLE_COMPILATION and INTERPROCEDURAL_OPTIMIZATION properties to ON for enabling link time optimization and compiling your source code into LTOIRs.

include(${cufftdx_ROOT}/cufftdx/example/lto_helper/lto_helper.cmake)

run_cufft_lto_helper(
  SRC_DIR ${cufftdx_ROOT}/cufftdx/example/lto_helper/
  BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}
  OUTPUT_NAME my_fft
  DESCS ${CMAKE_SOURCE_DIR}/input.csv
  ARCHITECTURES 70,80
)

target_link_libraries(YourProgram PRIVATE my_fft_lto_lib)

set_target_properties(YourProgram
  PROPERTIES
    CUDA_SEPARABLE_COMPILATION ON
    INTERPROCEDURAL_OPTIMIZATION ON
)

For the online kernel generation use case:

Define the CUFFTDX_ENABLE_CUFFT_DEPENDENCY macro.

target_compile_definitions(YourProgram PRIVATE CUFFTDX_ENABLE_CUFFT_DEPENDENCY)

Finding cufft using find_package:

find_package(cufft REQUIRED CONFIG)

You can pass the path to cuFFT package using PATHS option:

find_package(cufft REQUIRED CONFIG PATHS "<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufft/")

Alternatively, you can set cufft_ROOT during cmake configuration of your project:

cmake -Dcufft_ROOT="<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufft/" (...)

Link with the two extra dependencies, cuFFT and the nvJitLink.

target_link_libraries(YourProgram
  PRIVATE
    cufft::cufft_static # or, cufft::cufft
    CUDA::nvJitLink
)

Defined Variables#

cufftdx_FOUND: True if cuFFTDx was found.
cufftdx_INCLUDE_DIRS: cuFFTDx include directories.
cufftdx_VERSION: cuFFTDx version number in X.Y.Z format.
cufftdx_cutlass_INCLUDE_DIR: CUTLASS include directory.