Installation Guide#

All device extension libraries are shipped in a single package (tarball/zip). Every supported CUDA Toolkit major version has its own separate package. They can be downloaded from the download page of any Dx library, as all libraries host the same package. Installation instructions for each library are in the corresponding library’s documentation.

cuBLASDx: Download | Installation
cuFFTDx: Download | Installation
cuSolverDx: Download | Installation
cuRANDDx: Download | Installation
nvCOMPDx: Download | Installation

Dx Library In Your Project#

Most of the device extension libraries are header-only libraries. To use them, simply add the MathDx include directory in compilation commands. It is also necessary to add paths to the dependencies, CUTLASS and commonDx, which are shipped with the MathDx package.

Library	Header-only	Header	LTO Library
cuBLASDx	No/Yes	cublasdx.hpp	libcublasdx.fatbin
cuFFTDx	Yes	cufftdx.hpp	No (See note below)
cuSolverDx	No	cusolverdx.hpp	libcusolverdx.a libcusolverdx.fatbin
cuRANDDx	Yes	curanddx.hpp	No
nvCOMPDx	No	nvcompdx.hpp	libnvcompdx.a libnvcompdx.fatbin

Note

Since version 0.6.0, cuBLASDx includes an LTO library for TRSM functionality.

Note

cuBLASDx provides a header-only library target (mathdx::cublasdx_no_lto) that does not require linking to the LTO library.

Note

Starting from version 1.6.0, cuFFTDx provides an experimental feature that extends functionality and improves performance by reusing optimized code from the cuFFT library. See the cuFFTDx documentation for more details.

After unpacking the MathDx YY.MM package into <your_directory>, the Dx header files are available at the following location:

<your_directory>/nvidia/mathdx/YY.MM/include/

The commonDx include directory is (same as for Dx headers):

<your_directory>/nvidia/mathdx/YY.MM/include

The CUTLASS include directory is:

<your_directory>/nvidia/mathdx/YY.MM/external/cutlass/include

Examples

The simplest way to use MathDx is to add all required include directories to the NVCC compilation command:

nvcc -std=c++17 -arch=sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary>

Linking LTO#

cuBLASDx (for TRSM), cuSolverDx, and nvCOMPDx require linking to the corresponding LTO (Link Time Optimization) libraries. cuSolverDx and nvCOMPDx are provided in two forms: static library lib<library_name>.a and fatbin lib<library_name>.fatbin. cuBLASDx provides only libcublasdx.fatbin.

lib<library_name>.fatbin, in contrast to lib<library_name>.a, contains only device code and thus is host platform agnostic. For example, it can be safely used on the AARCH64 platform, whereas lib<library_name>.a can only be used for x86_64 Linux builds.

Important

When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile must be from the same or older CUDA Toolkit than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.

Example

For example, cuSolverDx 0.4.0 from MathDx 26.03.0 package supports CUDA Toolkit 13.0 or newer. It is therefore required to use NVCC / NVRTC 13.0 or newer to compile, and NVCC / nvJitLink 13.0 or newer to perform a linking stage.

When compiling with NVCC, it is necessary to link to the corresponding LTO library file:

# When using lib<library_name>.a
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> -l<library_name>

# When using lib<library_name>.fatbin
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> lib<library_name>.fatbin

The -dlto option at link time instructs the linker to retrieve the LTO IR from the library object and perform optimization on the resulting IR for code generation.

When using NVRTC and nvJitLink for runtime kernel compilation and linking, it is possible to use either fatbin file lib<library_name>.fatbin or lib<library_name>.a. The fatbin must be used for platforms other than x86_64 Linux.

Dx Library In Your CMake Project#

The MathDx package provides a configuration file that simplifies using Dx libraries in other CMake projects. After finding mathdx using find_package, users must link mathdx::<library_name> to their target (see MathDx CMake targets). This propagates the include directory <library_name>_INCLUDE_DIRS, required dependencies, and the C++17 requirement to their target.

For example, linking cuBLASDx and cuRANDDx to the YourProgram CMake target can be done as follows:

find_package(mathdx REQUIRED COMPONENTS cublasdx curanddx CONFIG)
target_link_libraries(YourProgram mathdx::cublasdx mathdx::curanddx)

You can pass the path to the MathDx package using the PATHS option:

find_package(mathdx REQUIRED COMPONENTS <libraries> CONFIG PATHS "<your_directory>/nvidia/mathdx/yy.mm/")

Alternatively, you can set mathdx_ROOT during cmake configuration of your project:

cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)

Linking LTO#

To enable LTO in CMake for a target, set INTERPROCEDURAL_OPTIMIZATION to true, and to allow separate compilation of device code set CUDA_SEPARABLE_COMPILATION to true. Both properties are required when linking device extension libraries that use LTO, both for targets relying on static library lib<library_name>.a and for targets relying on fatbin library lib<library_name>.fatbin.

mathdx::<library_name>_fatbin targets can be used instead of mathdx::<library_name>. They rely on fatbin library lib<library_name>.fatbin.

Note

cuBLASDx provides the cublasdx::cublasdx target that relies on the fatbin library libcublasdx.fatbin when CUDA Toolkit 13.2.0 or later is used; otherwise it is a header-only library target. There is also a cublasdx::cublasdx_fatbin target that always includes the fatbin library. cuBLASDx does not provide a static library target (libcublasdx.a). Check the cuBLASDx documentation for more details.

Important

Example

For example, linking cuSolverDx to the YourProgram CMake target can be done as follows:

# find cuSolverDx
find_package(mathdx REQUIRED COMPONENTS cusolverdx CONFIG)
# enable LTO in your target
set_target_properties(YourProgram
      PROPERTIES
          CUDA_SEPARABLE_COMPILATION ON
          INTERPROCEDURAL_OPTIMIZATION ON)
# link against mathdx::cusolverdx
target_link_libraries(YourProgram mathdx::cusolverdx)
# or, alternatively, against mathdx::cusolverdx_fatbin
target_link_libraries(YourProgram mathdx::cusolverdx_fatbin)

Targets#

For all details about the targets, please refer to the documentation of the corresponding library.

Library	CMake Target(s)
cuBLASDx	`mathdx::cublasdx` `mathdx::cublasdx_fatbin` `mathdx::cublasdx_no_lto`
cuFFTDx	`mathdx::cufftdx` `mathdx::cufftdx_separate_twiddles_lut`
cuRANDDx	`mathdx::curanddx`
cuSolverDx	`mathdx::cusolverdx` `mathdx::cusolverdx_fatbin`
nvCOMPDx	`mathdx::nvcompdx` `mathdx::nvcompdx_fatbin`

Defined Variables#

For all details about the defined variables, please refer to the documentation of the corresponding library.

mathdx_FOUND: True if MathDx was found.
mathdx_VERSION: MathDx package version number in X.Y.Z format.
mathdx_INCLUDE_DIRS: MathDx include directories.
mathdx_<library_name>_FOUND, <library_name>_FOUND: True if <library_name> was found. Example: mathdx_cublasdx_FOUND, cublasdx_FOUND.
<library_name>_VERSION: <library_name> version number in X.Y.Z format. Example: cublasdx_VERSION.
<library_name>_INCLUDE_DIRS: <library_name> include directories. Example: cublasdx_INCLUDE_DIRS.

Additionally, the following variables are defined for libraries with LTO:

<library_name>_LIBRARIES: <library_name> library target, which uses static library lib<library_name>.a. This variable does not exist for cuBLASDx, which only provides the fatbin library. Example: cusolverdx_LIBRARIES.
<library_name>_FATBIN: Path to <library_name> fatbin library lib<library_name>.fatbin. Example: cusolverdx_FATBIN.