Quick Installation Guide#
The cuFFTDx + cuFFT LTO EA package is distributed as a joint package containing the cuFFTDx 1.4
and cuFFT 11.5
libraries. Both libraries are designed to work with each other.
This section covers the package content, structure, and instructions for using and building with the cuFFTDx + cuFFT LTO EA package.
Note
For general cuFFTDx
installation instructions, please refer to the Installation section of the official cuFFTDx documentation.
The cuFFTDx + cuFFT LTO EA package includes two main components:
cuFFTDx 1.4: A header-only library that can be used by including the directory with
cufftdx.hpp
and thecommonDx
into the users’ compilation commands.cuFFT 11.5: A host library that provides an API to generate LTO databases to be consumed by cuFFTDx.
Once unpacked, the directory structure will look like:
nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/
├── cufft
│ ├── include
│ │ ├── cufft_device.h
│ │ ├── cufft.h
│ │ └── ... etc
│ └── lib
│ ├── cmake
│ ├── libcufft.so
│ ├── libcufft_static.a
│ └── ... etc
└── cufftdx
├── doc
│ └── cufftdx
├── example
│ └── cufftdx
├── include
│ ├── commondx
│ ├── cufftdx
│ ├── cufftdx.hpp
│ └── ... etc
├── lib
│ └── cmake
└── src
└── cufftdx
cuFFTDx with LTO in your Project#
The instructions for how to use cuFFTDx remain almost the same. The only difference here is that cuFFTDx is not distributed as part of the MathDx package.
So, rather than using <mathdx_include_dir>
to include the directory with cufftdx.hpp
and commonDx
, one should use <cufftdx_include_dir>
(i.e., nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufftdx/include/
):
nvcc -std=c++17 -arch sm_XY (...) \
-I<cufftdx_include_dir> \
<your_source_file>.cu -o <your_binary>
Additional requirements are listed in the Requirements section.
Hint
The example
directory contains a Makefile
demonstrating how to build the provided sample applications. You can review it and adapt it to your project’s needs.
Using cuFFT LTO Database in cuFFTDx#
Use Case I: Offline Kernel Generation#
To enhance your cuFFTDx library via cuFFT-generated code, additional compilation steps are required.
Note
For the offline kernel generation use case, cuFFT is only needed to generate the initial LTO database files. Once generated, you can build and run your application without cuFFT.
Create the LTO database using cuFFT:
Using either
the provided LTO Helper utility (see Use Case I: Offline Kernel Generation for illustration), or
the cuFFT device API (see cuFFT Device API through Example for illustration)
for the LTO database generation. The output includes a C++ header file (
lto_database.hpp.inc
) as well as.ltoir
and.fatbin
files.Include the generated C++ header file and compile cuFFTDx application code into LTOIRs:
nvcc -std=c++17 \ -dc \ --generate-code arch=compute_75,code=lto_75 \ -I<cufftdx_include_dir> \ -I/path/to/lto_database.hpp.inc \ <your_source_file>.cu -o <your_source_file>.o
Replace
75
with the target CUDA architecture.Device link with LTOIRs:
nvcc -dlink \ -dlto \ --generate-code arch=compute_75,code=sm_75 \ database_X.fatbin ... \ database_X.ltoir ... \ <your_source_file>.o -o <your_source_file>_dlink.o
Host link the object files to produce the final executable:
g++ -L${CUDA_PATH}/lib64 -lcudart \ <your_source_file>.o <your_source_file>_dlink.o -o <your_binary>
Note
The lto_database.hpp.inc
, .fatbin
and .ltoir
files must be generated with descriptions that are consistent with the traits of the cuFFTDx operators
in your project and the target architectures.
If there is a mismatch, cuFFTDx will fall back to using the PTX implementation instead of the LTOIR implementation, causing this assertion to fail:
static_assert(FFT::code == cufftdx::experimental::code_type::ltoir, "Selected implementation code type is not LTOIR.");
Use Case II: Online Kernel Generation#
When building the NVRTC codes, one needs take into account the following:
Link with the two extra dependencies, cuFFT and nvJitLink
Define the
CUFFTDX_ENABLE_CUFFT_DEPENDENCY
macro for using the (online) LTO Database Creation helper function.
g++ -DCUFFTDX_ENABLE_CUFFT_DEPENDENCY \
-L<cufft_lib_path> -I<cufft_include_path> \
-I<cufftdx_include_path> \
-lcufft \
-lnvJitLink \
-lnvrtc \
-L<cuda_lib_path> -lcuda <your_source_file>.cu -o <your_binary>
cuFFTDx with LTO in your CMake Project#
Hint
The example
directory contains a CMakeLists.txt
demonstrating how to compile the provided sample applications. You can review it and adapt it to your project’s needs.
The cuFFTDx + cuFFT LTO EA package provides configuration files that simplify its integration into a CMake project. To include cuFFTDx 1.4 in your code:
Finding
cufftdx
usingfind_package
:find_package(cufftdx REQUIRED CONFIG)
You can pass the path to cuFFTDx package using
PATHS
option:find_package(cufftdx REQUIRED CONFIG PATHS "<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/")
Alternatively, you can set
cufftdx_ROOT
during cmake configuration of your project:cmake -Dcufftdx_ROOT="<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufftdx/" (...)
Link the
cufftdx::cufftdx
target to your program:target_link_libraries(YourProgram PRIVATE cufftdx::cufftdx)
Incorporate the steps from Using cuFFT LTO Database in cuFFTDx using CMake:
For the offline kernel generation use case:
Run the
run_cufft_lto_helper
CMake function provided in thelto_helper.cmake
script in theexample/lto_helper
directory to generate the LTO database.After running, the
${OUTPUT_NAME}_lto_lib
library will be available to be linked with your program.
Set both
CUDA_SEPARABLE_COMPILATION
andINTERPROCEDURAL_OPTIMIZATION
properties toON
for enabling link time optimization and compiling your source code into LTOIRs.
include(${cufftdx_ROOT}/cufftdx/example/lto_helper/lto_helper.cmake) run_cufft_lto_helper( SRC_DIR ${cufftdx_ROOT}/cufftdx/example/lto_helper/ BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR} OUTPUT_NAME my_fft DESCS ${CMAKE_SOURCE_DIR}/input.csv ARCHITECTURES 70,80 ) target_link_libraries(YourProgram PRIVATE my_fft_lto_lib) set_target_properties(YourProgram PROPERTIES CUDA_SEPARABLE_COMPILATION ON INTERPROCEDURAL_OPTIMIZATION ON )
For the online kernel generation use case:
Define the
CUFFTDX_ENABLE_CUFFT_DEPENDENCY
macro.target_compile_definitions(YourProgram PRIVATE CUFFTDX_ENABLE_CUFFT_DEPENDENCY)
Finding
cufft
usingfind_package
:find_package(cufft REQUIRED CONFIG)
You can pass the path to cuFFT package using
PATHS
option:find_package(cufft REQUIRED CONFIG PATHS "<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufft/")
Alternatively, you can set
cufft_ROOT
during cmake configuration of your project:cmake -Dcufft_ROOT="<your_directory>/nvidia-cufftdx-1.4.0-cufft-11.5.0_<PLATFORM>/cufft/" (...)
Link with the two extra dependencies, cuFFT and the nvJitLink.
target_link_libraries(YourProgram PRIVATE cufft::cufft_static # or, cufft::cufft CUDA::nvJitLink )
Defined Variables#
cufftdx_FOUND
True if cuFFTDx was found.
cufftdx_INCLUDE_DIRS
cuFFTDx include directories.
cufftdx_VERSION
cuFFTDx version number in
X.Y.Z
format.cufftdx_cutlass_INCLUDE_DIR
CUTLASS include directory.