Installation Guide#
All device extension libraries are shipped in a single package (tarball/zip). Every supported CUDA Toolkit major version has its own separate package. They can be downloaded from the download page of any Dx library, as all libraries host the same package. Installation instructions for each library are in the corresponding library’s documentation.
- cuBLASDx
- cuFFTDx
- cuSolverDx
- cuRANDDx
- nvCOMPDx
Dx Library In Your Project#
Most of the device extension libraries are header-only libraries. To use them, simply add the MathDx include directory in compilation commands. It is also necessary to add paths to the dependencies, CUTLASS and commonDx, which are shipped with the MathDx package.
Library |
Header-only |
Header |
LTO Library |
|---|---|---|---|
cuBLASDx |
No/Yes |
cublasdx.hpp |
libcublasdx.fatbin
|
cuFFTDx |
Yes |
cufftdx.hpp |
No (See note below) |
cuSolverDx |
No |
cusolverdx.hpp |
libcusolverdx.a
libcusolverdx.fatbin
|
cuRANDDx |
Yes |
curanddx.hpp |
No |
nvCOMPDx |
No |
nvcompdx.hpp |
libnvcompdx.a
libnvcompdx.fatbin
|
Note
Since version 0.6.0, cuBLASDx includes an LTO library for TRSM functionality.
Note
cuBLASDx provides a header-only library target (mathdx::cublasdx_no_lto) that does not require linking to the LTO library.
Note
Starting from version 1.6.0, cuFFTDx provides an experimental feature that extends functionality and improves performance by reusing optimized code from the cuFFT library. See the cuFFTDx documentation for more details.
After unpacking the MathDx YY.MM package into <your_directory>, the Dx header files are available at the following location:
<your_directory>/nvidia/mathdx/YY.MM/include/
The commonDx include directory is (same as for Dx headers):
<your_directory>/nvidia/mathdx/YY.MM/include
The CUTLASS include directory is:
<your_directory>/nvidia/mathdx/YY.MM/external/cutlass/include
Examples
The simplest way to use MathDx is to add all required include directories to the NVCC compilation command:
nvcc -std=c++17 -arch=sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary>
Linking LTO#
cuBLASDx (for TRSM), cuSolverDx, and nvCOMPDx require linking to the corresponding LTO
(Link Time Optimization) libraries.
cuSolverDx and nvCOMPDx are provided in two forms: static library lib<library_name>.a and fatbin lib<library_name>.fatbin.
cuBLASDx provides only libcublasdx.fatbin.
lib<library_name>.fatbin, in contrast to lib<library_name>.a,
contains only device code and thus is host platform agnostic.
For example, it can be safely used on the AARCH64 platform,
whereas lib<library_name>.a can only be used for x86_64 Linux builds.
Important
When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile must be from the same or older CUDA Toolkit than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.
When compiling with NVCC, it is necessary to link to the corresponding LTO library file:
# When using lib<library_name>.a
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> -l<library_name>
# When using lib<library_name>.fatbin
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> lib<library_name>.fatbin
The -dlto option at link time instructs the linker to retrieve the LTO IR from the library object and perform optimization
on the resulting IR for code generation.
When using NVRTC and nvJitLink for runtime kernel compilation and linking, it is possible to use either
fatbin file lib<library_name>.fatbin or lib<library_name>.a.
The fatbin must be used for platforms other than x86_64 Linux.
Dx Library In Your CMake Project#
The MathDx package provides a configuration file that simplifies using Dx libraries in other CMake projects.
After finding mathdx using find_package,
users must link mathdx::<library_name> to their target (see MathDx CMake targets).
This propagates the include directory <library_name>_INCLUDE_DIRS, required dependencies,
and the C++17 requirement to their target.
For example, linking cuBLASDx and cuRANDDx to the YourProgram CMake target can be done as follows:
find_package(mathdx REQUIRED COMPONENTS cublasdx curanddx CONFIG)
target_link_libraries(YourProgram mathdx::cublasdx mathdx::curanddx)
You can pass the path to the MathDx package using the PATHS option:
find_package(mathdx REQUIRED COMPONENTS <libraries> CONFIG PATHS "<your_directory>/nvidia/mathdx/yy.mm/")
Alternatively, you can set mathdx_ROOT during cmake configuration of your project:
cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)
Linking LTO#
To enable LTO in CMake for a target, set INTERPROCEDURAL_OPTIMIZATION to true,
and to allow separate compilation of device code set CUDA_SEPARABLE_COMPILATION to true.
Both properties are required when linking device extension libraries that use LTO, both for targets
relying on static library lib<library_name>.a and for targets relying on fatbin library lib<library_name>.fatbin.
mathdx::<library_name>_fatbin targets can be used instead of mathdx::<library_name>.
They rely on fatbin library lib<library_name>.fatbin.
lib<library_name>.fatbin, in contrast to lib<library_name>.a,
contains only device code and thus is host platform agnostic.
For example, it can be safely used on the AARCH64 platform,
whereas lib<library_name>.a can only be used for x86_64 Linux builds.
Note
cuBLASDx provides the cublasdx::cublasdx target that relies on the fatbin library libcublasdx.fatbin when
CUDA Toolkit 13.2.0 or later is used; otherwise it is a header-only library target.
There is also a cublasdx::cublasdx_fatbin target that always includes the fatbin library.
cuBLASDx does not provide a static library target (libcublasdx.a).
Check the cuBLASDx documentation for more details.
Important
When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile must be from the same or older CUDA Toolkit than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.
For example, linking cuSolverDx to the YourProgram CMake target can be done as follows:
# find cuSolverDx
find_package(mathdx REQUIRED COMPONENTS cusolverdx CONFIG)
# enable LTO in your target
set_target_properties(YourProgram
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON
INTERPROCEDURAL_OPTIMIZATION ON)
# link against mathdx::cusolverdx
target_link_libraries(YourProgram mathdx::cusolverdx)
# or, alternatively, against mathdx::cusolverdx_fatbin
target_link_libraries(YourProgram mathdx::cusolverdx_fatbin)
Targets#
For all details about the targets, please refer to the documentation of the corresponding library.
Library |
CMake Target(s) |
|---|---|
cuBLASDx |
mathdx::cublasdxmathdx::cublasdx_fatbinmathdx::cublasdx_no_lto |
cuFFTDx |
mathdx::cufftdxmathdx::cufftdx_separate_twiddles_lut |
cuRANDDx |
|
cuSolverDx |
mathdx::cusolverdxmathdx::cusolverdx_fatbin |
nvCOMPDx |
mathdx::nvcompdxmathdx::nvcompdx_fatbin |
Defined Variables#
For all details about the defined variables, please refer to the documentation of the corresponding library.
mathdx_FOUNDTrue if MathDx was found.
mathdx_VERSIONMathDx package version number in
X.Y.Zformat.mathdx_INCLUDE_DIRSMathDx include directories.
mathdx_<library_name>_FOUND,<library_name>_FOUNDTrue if
<library_name>was found. Example:mathdx_cublasdx_FOUND,cublasdx_FOUND.<library_name>_VERSION<library_name>version number inX.Y.Zformat. Example:cublasdx_VERSION.<library_name>_INCLUDE_DIRS<library_name>include directories. Example:cublasdx_INCLUDE_DIRS.
Additionally, the following variables are defined for libraries with LTO:
<library_name>_LIBRARIES<library_name>library target, which uses static librarylib<library_name>.a. This variable does not exist for cuBLASDx, which only provides the fatbin library. Example:cusolverdx_LIBRARIES.<library_name>_FATBINPath to
<library_name>fatbin librarylib<library_name>.fatbin. Example:cusolverdx_FATBIN.