Installation Guide#

All device extension libraries are shipped in a single package (tarball/zip). Every supported CUDA Toolkit major version has its own separate package. They can be downloaded from a download page of any of Dx libraries (all host the same package). More details about installation of each library can be found in the documentation of the corresponding library.

cuBLASDx: Download | Installation
cuFFTDx: Download | Installation
cuSolverDx: Download | Installation
cuRANDDx: Download | Installation
nvCOMPDx: Download | Installation

Dx Library In Your Project#

Most of the device extension libraries are header-only libraries, to use those simply add MathDx include directory in compilation commands. It’s also necessary to add paths to the dependencies, CUTLASS and commonDx, which are shipped with the MathDx package.

Library	Header-only	Header	LTO Library
cuBLASDx	Yes	cublasdx.hpp	No
cuFFTDx	Yes	cufftdx.hpp	No (See note below)
cuSolverDx	No	cusolverdx.hpp	libcusolverdx.a libcusolverdx.fatbin
cuRANDDx	Yes	curanddx.hpp	No
nvCOMPDx	No	nvcompdx.hpp	libnvcompdx.a libnvcompdx.fatbin

Note

Starting from version 1.6.0, cuFFTDx provides an experimental feature that extends functionality and improves performance by reusing optimized code from the cuFFT library. See the cuFFTDx documentation for more details.

After unpacking the MathDx YY.MM package into <your_directory>, the Dx header files are available at the following location:

<your_directory>/nvidia/mathdx/YY.MM/include/

The commonDx include directory is (same as for Dx headers):

<your_directory>/nvidia/mathdx/YY.MM/include

The CUTLASS include directory is:

<your_directory>/nvidia/mathdx/YY.MM/external/cutlass/include

Examples

The simplest way to use MathDx is to add all required include directories to the NVCC compilation command:

nvcc -std=c++17 -arch=sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary>

Linking LTO#

Libraries cuSolverDx and nvCOMPDx require linking to the corresponding LTO (Link Time Optimization) libraries. Both libraries provide their libraries in two forms: static library lib<library_name>.a and fatbin lib<library_name>.fatbin.

lib<library_name>.fatbin, in contrast to lib<library_name>.a, contains only device code and thus is host platform agnostic. For example, it can be safely used on the AARCH64 platform, whereas lib<library_name>.a can only be used for x86_64 Linux builds.

Important

Fatbin file (lib<library_name>.fatbin) can only be used with NVCC 12.8 or newer. This limitation doesn’t apply to NVRTC and nvJitLink.

Important

When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile the code must be from a CUDA toolkit that is either the same version or older than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.

Example

For example, cuSolverDx 0.3.0 from MathDx 25.12.0 CUDA Toolkit 12 package supports CUDA Toolkit 12.6 or newer (but within 12 major version), so it’s required to use NVCC / NVRTC 12.6 or newer to compile the code, and NVCC / nvJitLink 12.6 or newer to perform the linking stage.

When compiling with NVCC it’s necessary to link to the corresponding LTO library file:

# When using lib<library_name>.a
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> -l<library_name>

# When using lib<library_name>.fatbin
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> lib<library_name>.fatbin

The -dlto option at link time instructs the linker to retrieve the LTO IR from the library object and perform optimization on the resulting IR for code generation.

When using NVRTC and nvJitLink for runtime kernel compilation and linking it’s possible to use either fatbin file lib<library_name>.fatbin or lib<library_name>.a. Fatbin has to be used for platforms other than x86_64 Linux.

Dx Library In Your CMake Project#

MathDx package provides a configuration file that simplifies using Dx libraries in other CMake projects. After finding mathdx using find_package, users have to link mathdx::<library_name> to their target (see MathDx CMake targets). This propagates the include directory <library_name>_INCLUDE_DIRS, required dependencies, and the C++17 requirement to their target.

For example, linking to cuBLASDx and cuRANDDx to YourProgram CMake target can be done as follows:

find_package(mathdx REQUIRED COMPONENTS cublasdx curanddx CONFIG)
target_link_libraries(YourProgram mathdx::cublasdx mathdx::curanddx)

You can pass the path to MathDx package using PATHS option:

find_package(mathdx REQUIRED COMPONENTS <libraries> CONFIG PATHS "<your_directory>/nvidia/mathdx/yy.mm/")

Alternatively, you can set mathdx_ROOT during cmake configuration of your project:

cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)

Linking LTO#

To enable LTO in CMake for a target, set INTERPROCEDURAL_OPTIMIZATION to true, and to allow separate compilation of device code set CUDA_SEPARABLE_COMPILATION to true. It is required when linking device extension libraries that use LTO, both for targets relying on static library lib<library_name>.a and for targets relying on fatbin library lib<library_name>.fatbin.

When CMake detects NVCC CUDA compiler 12.8 or newer, MathDx exposes mathdx::<library_name>_fatbin targets. They can be used instead of mathdx::<library_name>. They rely on fatbin library lib<library_name>.fatbin.

Important

mathdx::<library_name>_fatbin is only available when CMake detects that NVCC CUDA compiler 12.8 or newer is used. You can check NVCC version in CMake script using CMAKE_CUDA_COMPILER_VERSION variable.

Important

Example

For example, linking to cuSolverDx to YourProgram CMake target can be done as follows:

# find cuSolverDx
find_package(mathdx REQUIRED COMPONENTS cusolverdx CONFIG)
# enable LTO in your target
set_target_properties(YourProgram
      PROPERTIES
          CUDA_SEPARABLE_COMPILATION ON
          INTERPROCEDURAL_OPTIMIZATION ON)
# link against mathdx::cusolverdx
target_link_libraries(YourProgram mathdx::cusolverdx)
# or, alternatively, against mathdx::cusolverdx_fatbin
target_link_libraries(YourProgram mathdx::cusolverdx_fatbin)

Targets#

For all details about the targets, please refer to the documentation of the corresponding library.

Library	CMake Target(s)
cuBLASDx	`mathdx::cublasdx`
cuFFTDx	`mathdx::cufftdx` `mathdx::cufftdx_separate_twiddles_lut`
cuRANDDx	`mathdx::curanddx`
cuSolverDx	`mathdx::cusolverdx` `mathdx::cusolverdx_fatbin` (available only for CTK 12.8+)
nvCOMPDx	`mathdx::nvcompdx` `mathdx::nvcompdx_fatbin` (available only for CTK 12.8+)

Defined Variables#

For all details about the defined variables, please refer to the documentation of the corresponding library.

mathdx_FOUND: True if MathDx was found.
mathdx_VERSION: MathDx package version number in X.Y.Z format.
mathdx_INCLUDE_DIRS: MathDx include directories.
mathdx_<library_name>_FOUND, <library_name>_FOUND: True if <library_name> was found. Example: mathdx_cublasdx_FOUND, cublasdx_FOUND.
<library_name>_VERSION: <library_name> version number in X.Y.Z format. Example: cublasdx_VERSION.
<library_name>_INCLUDE_DIRS: <library_name> include directories. Example: cublasdx_INCLUDE_DIRS.

Additionally, the following variables are defined for libraries with LTO:

<library_name>_LIBRARIES: <library_name> library target, which uses static library lib<library_name>.a. Example: cusolverdx_LIBRARIES.
<library_name>_FATBIN: Path to <library_name> fatbin library lib<library_name>.fatbin. Example: cusolverdx_FATBIN.