Quick Installation Guide#
The cuFFTDx library is distributed as a part of the MathDx package. To download the most recent release of the MathDx package including cuFFTDx go to the https://developer.nvidia.com/cufftdx-downloads website.
Note
MathDx package contains:
cuBLASDx for selected linear algebra functions like General Matrix Multiplication (GEMM),
cuFFTDx for FFT calculations,
cuSolverDx for selected dense matrix factorization and solve routines,
cuRANDDx for random number generation.
nvCOMPDx for compression/decompression of data.
MathDx libraries are designed to work together in a single project.
Note that for a project where multiple device extensions libraries are used all the libraries must come from the same MathDx release. Examples of such fusion are included in the package.
cuFFTDx In Your Project#
cuFFTDx is a header-only library, thus to use it users just need to the include the directory with cufftdx.hpp and commonDx
into their compilation commands. All other requirements are listed in the Requirements section. The
easiest way is to use the MathDx include directory:
nvcc -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> \
<your_source_file>.cu -o <your_binary>
When you unpack MathDx YY.MM package tarball into in <your_directory>, cufftdx.hpp file will be available at the following
locations:
<your_directory>/nvidia/mathdx/yy.mm/include/
You can review Makefile shipped alongside cuFFTDx examples to check how they are compiled.
Note
Since version 1.2.1 cuFFTDx has indirect dependency on CUTLASS library.
If no other Dx library is used, the dependency can be disabled by defining CUFFTDX_DISABLE_CUTLASS_DEPENDENCY macro.
Defining CUFFTDX_DISABLE_CUTLASS_DEPENDENCY if other Dx libraries are used can lead to compilation errors. Since cuFFTDx 1.3.0, due to version incompatibilities, it is recommended to disable the dependency when using CUDA toolkits with versions earlier than 11.4.
Note
Since version 1.3.0 cuFFTDx provides a way to optimize the final binary size for user projects. This works by linking all executables to a shared object
with pre-calculated constants required for FFT calculations thus sharing them between translation units. It is especially recommended when including
a large number of translation units (.cu source files) with cuFFTDx kernels in an executable.
To compile with this new optimization it’s enough to add -DCUFFTDX_USE_SEPARATE_TWIDDLES -rdc=true to the compilation command as well as
including <mathdx_dir>/src/cufftdx/lut.cu to be compiled.
nvcc -DCUFFTDX_USE_SEPARATE_TWIDDLES -std=c++17 -arch sm_XY \
(...) -rdc=true -I<mathdx_include_dir> \
<cufftdx_include_dir>/liblut/lut.cu \
<your_source_file>.cu <mathdx_include_dir> \
-o <your_binary>
The Makefile included alongside cuFFTDx examples contains these additional compilation flags. The following command
will build the examples with this optimization.
make all_twiddles
Using cuFFT LTO Database with cuFFTDx#
Use Case I: Offline Kernel Generation#
To enhance your cuFFTDx library via cuFFT-generated code, additional compilation steps are required.
Note
For the offline kernel generation use case, cuFFT is only needed to generate the initial LTO database files. Once generated, you can build and run your application without cuFFT.
Create the LTO database using cuFFT:
Using either
the provided LTO Helper utility (see Use Case I: Offline Kernel Generation for illustration), or
the cuFFT device API (see cuFFT Device API through Example for illustration)
for the LTO database generation. The output includes a C++ header file (
lto_database.hpp.inc) as well as.ltoirand.fatbinfiles.Include the generated C++ header file and compile cuFFTDx application code into LTOIRs:
nvcc -std=c++17 \ -dc \ --generate-code arch=compute_75,code=lto_75 \ -I<cufftdx_include_dir> \ -I/path/to/directory/containing/lto_database.hpp.inc \ <your_source_file>.cu -o <your_source_file>.o
Replace
75with the target CUDA architecture.Device link with LTOIRs:
nvcc -dlink \ -dlto \ --generate-code arch=compute_75,code=sm_75 \ database_X.fatbin ... \ database_X.ltoir ... \ <your_source_file>.o -o <your_source_file>_dlink.o
Host link the object files to produce the final executable:
g++ -L${CUDA_PATH}/lib64 -lcudart \ <your_source_file>.o <your_source_file>_dlink.o -o <your_binary>
Note
The lto_database.hpp.inc, .fatbin and .ltoir files must be generated with descriptions that are consistent with the traits of the cuFFTDx operators
in your project and the target architectures.
If there is a mismatch, cuFFTDx will fall back to using the PTX implementation instead of the LTOIR implementation, causing this assertion to fail:
static_assert(
FFT::code == cufftdx::experimental::code_type::ltoir,
"Selected implementation code type is not LTOIR.");
Use Case II: Online Kernel Generation#
When building the NVRTC codes, one needs to take into account the following:
Link with the two extra dependencies, cuFFT and nvJitLink
Define the
CUFFT_ENABLE_EXPERIMENTAL_APImacro for using cuFFT Device API.Define the
CUFFTDX_ENABLE_CUFFT_DEPENDENCYmacro for using the (online) LTO Database Creation helper function.
g++ -DCUFFTDX_ENABLE_CUFFT_DEPENDENCY \
-DCUFFT_ENABLE_EXPERIMENTAL_API \
-L<cufft_lib_path> -I<cufft_include_path> \
-I<cufftdx_include_path> \
-lcufft \
-lnvJitLink \
-lnvrtc \
-L<cuda_lib_path> -lcuda <your_source_file>.cu -o <your_binary>
cuFFTDx In Your CMake Project#
The MathDx package provides configuration files that simplify using cuFFTDx in other CMake projects. After
finding mathdx using find_package, users have to link mathdx::cufftdx to their target. This propagates the include directory
cufftdx_INCLUDE_DIRS, the mathdx::commondx dependency, and the C++17 requirement to their target.
find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG)
target_link_libraries(YourProgram mathdx::cufftdx)
You can pass the path to MathDx package using PATHS option:
find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG
PATHS "<your_directory>/nvidia/mathdx/yy.mm/")
Alternatively, you can set mathdx_ROOT during cmake configuration of your project:
cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)
Note
Since version 1.2.1 cuFFTDx has indirect dependency on CUTLASS library.
If no other Dx library is used, the dependency can be disabled by adding set(mathdx_cufftdx_DISABLE_CUTLASS TRUE) before find_package(...).
Disabling the dependency on CUTLASS if other Dx libraries are used can lead to compilation errors.
Note
Since version 1.3.0 a new target is provided that allows the optimization of the final binary size when including cuFFTDx.
This works by linking all executables to a shared object with pre-calculated constants required for FFT calculations thus sharing them between translation units.
It is especially recommended when including a large number of translation units (.cu source files) with cuFFTDx kernels in an executable.
You have to set the architectures for compilation of that target by adding set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES <your cuda architectures>)
or setting the default variable CMAKE_CUDA_ARCHITECTURES, for example, set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES 90-real).
It must be made set before find_package(...) is performed.
Also, as a separate object file is created for the pre-calculated values, your executable target should enable cuda separable compilation:
set_property(TARGET YourProgram PROPERTY CUDA_SEPARABLE_COMPILATION ON).
set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES YourChosenArchitecture)
find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG)
target_link_libraries(YourProgram mathdx::cufftdx_separate_twiddles_lut)
set_property(TARGET YourProgram PROPERTY CUDA_SEPARABLE_COMPILATION ON)
Using cuFFT LTO Database with cuFFTDx in CMake#
Following shows how to incorporate the steps from Using cuFFT LTO Database with cuFFTDx using CMake.
For the offline kernel generation use case:
Run the
run_cufft_lto_helperCMake function provided in thelto_helper.cmakescript in theexample/lto_helperdirectory to generate the LTO database.After running, the
${OUTPUT_NAME}_lto_liblibrary will be available to be linked with your program.
Set both
CUDA_SEPARABLE_COMPILATIONandINTERPROCEDURAL_OPTIMIZATIONproperties toONfor enabling link time optimization and compiling your source code into LTOIRs.
include(${cufftdx_ROOT}/cufftdx/example/lto_helper/lto_helper.cmake)
run_cufft_lto_helper(
SRC_DIR ${cufftdx_ROOT}/cufftdx/example/lto_helper/
BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}
OUTPUT_NAME my_fft
DESCS ${CMAKE_SOURCE_DIR}/input.csv
ARCHITECTURES 70,80
)
target_link_libraries(YourProgram PRIVATE my_fft_lto_lib)
set_target_properties(YourProgram
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON
INTERPROCEDURAL_OPTIMIZATION ON
)
For the online kernel generation use case:
Define the
CUFFT_ENABLE_EXPERIMENTAL_APIandCUFFTDX_ENABLE_CUFFT_DEPENDENCYmacros.target_compile_definitions(YourProgram PRIVATE CUFFTDX_ENABLE_CUFFT_DEPENDENCY) target_compile_definitions(YourProgram PRIVATE CUFFT_ENABLE_EXPERIMENTAL_API)
Add
Findcufft.cmake(shipped with cuFFTDx) to your project’s CMake module path and usefind_packageto locatecufft:list(APPEND CMAKE_MODULE_PATH "${cufftdx_cufft_MODULE_PATH}") # Path to Findcufft.cmake find_package(cufft REQUIRED)
You can set
cufft_ROOTduring cmake configuration of your project:cmake -Dcufft_ROOT="<your_ctk_directory>/" (...)
Link with the two extra dependencies, cuFFT and the nvJitLink.
target_link_libraries(YourProgram PRIVATE cufft::cufft_static # or, cufft::cufft CUDA::nvJitLink )
Defined Variables#
mathdx_cufftdx_FOUND,cufftdx_FOUNDTrue if cuFFTDx was found.
cufftdx_INCLUDE_DIRScuFFTDx include directories.
mathdx_INCLUDE_DIRSMathDx include directories.
mathdx_cutlass_INCLUDE_DIRCUTLASS include directory.
mathdx_VERSIONMathDx package version number in
X.Y.Zformat.cuFFTDx_VERSIONcuFFTDx version number in
X.Y.Zformat.MathDx/cuFFTDx version matrix
MathDx
cuFFTDx
22.02
1.0.0
22.11
1.1.0
24.01
1.1.1
24.04
1.2.0
24.08
1.2.1
25.01
1.3.0
25.01.1
1.3.1
-
1.4.0
25.06
1.5.0
25.06.1
1.5.1
25.12
1.6.0