Quick Installation Guide#

The cuFFTDx library is distributed as a part of the MathDx package. To download the most recent release of the MathDx package including cuFFTDx go to the https://developer.nvidia.com/cufftdx-downloads website.

Note

MathDx package contains:

  • cuBLASDx for selected linear algebra functions like General Matrix Multiplication (GEMM),

  • cuFFTDx for FFT calculations,

  • cuSolverDx for selected dense matrix factorization and solve routines,

  • cuRANDDx for random number generation.

  • nvCOMPDx for compression/decompression of data.

MathDx libraries are designed to work together in a single project.

Note that for a project where multiple device extensions libraries are used all the libraries must come from the same MathDx release. Examples of such fusion are included in the package.

cuFFTDx In Your Project#

cuFFTDx is a header-only library, thus to use it users just need to the include the directory with cufftdx.hpp and commonDx into their compilation commands. All other requirements are listed in the Requirements section. The easiest way is to use the MathDx include directory:

nvcc -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> \
     <your_source_file>.cu -o <your_binary>

When you unpack MathDx YY.MM package tarball into in <your_directory>, cufftdx.hpp file will be available at the following locations:

  • <your_directory>/nvidia/mathdx/yy.mm/include/

You can review Makefile shipped alongside cuFFTDx examples to check how they are compiled.

Note

Since version 1.2.1 cuFFTDx has indirect dependency on CUTLASS library. If no other Dx library is used, the dependency can be disabled by defining CUFFTDX_DISABLE_CUTLASS_DEPENDENCY macro. Defining CUFFTDX_DISABLE_CUTLASS_DEPENDENCY if other Dx libraries are used can lead to compilation errors. Since cuFFTDx 1.3.0, due to version incompatibilities, it is recommended to disable the dependency when using CUDA toolkits with versions earlier than 11.4.

Note

Since version 1.3.0 cuFFTDx provides a way to optimize the final binary size for user projects. This works by linking all executables to a shared object with pre-calculated constants required for FFT calculations thus sharing them between translation units. It is especially recommended when including a large number of translation units (.cu source files) with cuFFTDx kernels in an executable. To compile with this new optimization it’s enough to add -DCUFFTDX_USE_SEPARATE_TWIDDLES -rdc=true to the compilation command as well as including <mathdx_dir>/src/cufftdx/lut.cu to be compiled.

nvcc -DCUFFTDX_USE_SEPARATE_TWIDDLES -std=c++17 -arch sm_XY \
     (...) -rdc=true -I<mathdx_include_dir> \
     <cufftdx_include_dir>/liblut/lut.cu \
     <your_source_file>.cu <mathdx_include_dir> \
     -o <your_binary>

The Makefile included alongside cuFFTDx examples contains these additional compilation flags. The following command will build the examples with this optimization.

make all_twiddles

Using cuFFT LTO Database with cuFFTDx#

Use Case I: Offline Kernel Generation#

To enhance your cuFFTDx library via cuFFT-generated code, additional compilation steps are required.

Note

For the offline kernel generation use case, cuFFT is only needed to generate the initial LTO database files. Once generated, you can build and run your application without cuFFT.

  1. Create the LTO database using cuFFT:

    Using either

    for the LTO database generation. The output includes a C++ header file (lto_database.hpp.inc) as well as .ltoir and .fatbin files.

  2. Include the generated C++ header file and compile cuFFTDx application code into LTOIRs:

    nvcc -std=c++17 \
         -dc \
         --generate-code arch=compute_75,code=lto_75 \
         -I<cufftdx_include_dir> \
         -I/path/to/directory/containing/lto_database.hpp.inc \
         <your_source_file>.cu -o <your_source_file>.o
    

    Replace 75 with the target CUDA architecture.

  3. Device link with LTOIRs:

    nvcc -dlink \
         -dlto \
         --generate-code arch=compute_75,code=sm_75 \
         database_X.fatbin ... \
         database_X.ltoir ... \
         <your_source_file>.o -o <your_source_file>_dlink.o
    
  4. Host link the object files to produce the final executable:

    g++ -L${CUDA_PATH}/lib64 -lcudart \
        <your_source_file>.o <your_source_file>_dlink.o -o <your_binary>
    

Note

The lto_database.hpp.inc, .fatbin and .ltoir files must be generated with descriptions that are consistent with the traits of the cuFFTDx operators in your project and the target architectures.

If there is a mismatch, cuFFTDx will fall back to using the PTX implementation instead of the LTOIR implementation, causing this assertion to fail:

static_assert(
    FFT::code == cufftdx::experimental::code_type::ltoir,
    "Selected implementation code type is not LTOIR.");

Use Case II: Online Kernel Generation#

When building the NVRTC codes, one needs to take into account the following:

  • Link with the two extra dependencies, cuFFT and nvJitLink

  • Define the CUFFT_ENABLE_EXPERIMENTAL_API macro for using cuFFT Device API.

  • Define the CUFFTDX_ENABLE_CUFFT_DEPENDENCY macro for using the (online) LTO Database Creation helper function.

g++ -DCUFFTDX_ENABLE_CUFFT_DEPENDENCY \
    -DCUFFT_ENABLE_EXPERIMENTAL_API \
    -L<cufft_lib_path> -I<cufft_include_path> \
    -I<cufftdx_include_path> \
    -lcufft \
    -lnvJitLink \
    -lnvrtc \
    -L<cuda_lib_path> -lcuda <your_source_file>.cu -o <your_binary>

cuFFTDx In Your CMake Project#

The MathDx package provides configuration files that simplify using cuFFTDx in other CMake projects. After finding mathdx using find_package, users have to link mathdx::cufftdx to their target. This propagates the include directory cufftdx_INCLUDE_DIRS, the mathdx::commondx dependency, and the C++17 requirement to their target.

find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG)
target_link_libraries(YourProgram mathdx::cufftdx)

You can pass the path to MathDx package using PATHS option:

find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG
             PATHS "<your_directory>/nvidia/mathdx/yy.mm/")

Alternatively, you can set mathdx_ROOT during cmake configuration of your project:

cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)

Note

Since version 1.2.1 cuFFTDx has indirect dependency on CUTLASS library. If no other Dx library is used, the dependency can be disabled by adding set(mathdx_cufftdx_DISABLE_CUTLASS TRUE) before find_package(...). Disabling the dependency on CUTLASS if other Dx libraries are used can lead to compilation errors.

Note

Since version 1.3.0 a new target is provided that allows the optimization of the final binary size when including cuFFTDx. This works by linking all executables to a shared object with pre-calculated constants required for FFT calculations thus sharing them between translation units. It is especially recommended when including a large number of translation units (.cu source files) with cuFFTDx kernels in an executable. You have to set the architectures for compilation of that target by adding set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES <your cuda architectures>) or setting the default variable CMAKE_CUDA_ARCHITECTURES, for example, set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES 90-real). It must be made set before find_package(...) is performed. Also, as a separate object file is created for the pre-calculated values, your executable target should enable cuda separable compilation: set_property(TARGET YourProgram PROPERTY CUDA_SEPARABLE_COMPILATION ON).

set(cufftdx_SEPARATE_TWIDDLES_CUDA_ARCHITECTURES YourChosenArchitecture)
find_package(mathdx REQUIRED COMPONENTS cufftdx CONFIG)
target_link_libraries(YourProgram mathdx::cufftdx_separate_twiddles_lut)
set_property(TARGET YourProgram PROPERTY CUDA_SEPARABLE_COMPILATION ON)

Using cuFFT LTO Database with cuFFTDx in CMake#

Following shows how to incorporate the steps from Using cuFFT LTO Database with cuFFTDx using CMake.

For the offline kernel generation use case:

  • Run the run_cufft_lto_helper CMake function provided in the lto_helper.cmake script in the example/lto_helper directory to generate the LTO database.

    • After running, the ${OUTPUT_NAME}_lto_lib library will be available to be linked with your program.

  • Set both CUDA_SEPARABLE_COMPILATION and INTERPROCEDURAL_OPTIMIZATION properties to ON for enabling link time optimization and compiling your source code into LTOIRs.

include(${cufftdx_ROOT}/cufftdx/example/lto_helper/lto_helper.cmake)

run_cufft_lto_helper(
  SRC_DIR ${cufftdx_ROOT}/cufftdx/example/lto_helper/
  BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}
  OUTPUT_NAME my_fft
  DESCS ${CMAKE_SOURCE_DIR}/input.csv
  ARCHITECTURES 70,80
)

target_link_libraries(YourProgram PRIVATE my_fft_lto_lib)

set_target_properties(YourProgram
  PROPERTIES
    CUDA_SEPARABLE_COMPILATION ON
    INTERPROCEDURAL_OPTIMIZATION ON
)

For the online kernel generation use case:

  • Define the CUFFT_ENABLE_EXPERIMENTAL_API and CUFFTDX_ENABLE_CUFFT_DEPENDENCY macros.

    target_compile_definitions(YourProgram PRIVATE CUFFTDX_ENABLE_CUFFT_DEPENDENCY)
    target_compile_definitions(YourProgram PRIVATE CUFFT_ENABLE_EXPERIMENTAL_API)
    
  • Add Findcufft.cmake (shipped with cuFFTDx) to your project’s CMake module path and use find_package to locate cufft:

    list(APPEND CMAKE_MODULE_PATH "${cufftdx_cufft_MODULE_PATH}") # Path to Findcufft.cmake
    find_package(cufft REQUIRED)
    

    You can set cufft_ROOT during cmake configuration of your project:

    cmake -Dcufft_ROOT="<your_ctk_directory>/" (...)
    
  • Link with the two extra dependencies, cuFFT and the nvJitLink.

    target_link_libraries(YourProgram
      PRIVATE
        cufft::cufft_static # or, cufft::cufft
        CUDA::nvJitLink
    )
    

Defined Variables#

mathdx_cufftdx_FOUND, cufftdx_FOUND

True if cuFFTDx was found.

cufftdx_INCLUDE_DIRS

cuFFTDx include directories.

mathdx_INCLUDE_DIRS

MathDx include directories.

mathdx_cutlass_INCLUDE_DIR

CUTLASS include directory.

mathdx_VERSION

MathDx package version number in X.Y.Z format.

cuFFTDx_VERSION

cuFFTDx version number in X.Y.Z format.

MathDx/cuFFTDx version matrix

MathDx

cuFFTDx

22.02

1.0.0

22.11

1.1.0

24.01

1.1.1

24.04

1.2.0

24.08

1.2.1

25.01

1.3.0

25.01.1

1.3.1

-

1.4.0

25.06

1.5.0

25.06.1

1.5.1

25.12

1.6.0