Installation Guide#
All device extension libraries are shipped in a single package (tarball/zip). Every supported CUDA Toolkit major version has its own separate package. They can be downloaded from a download page of any of Dx libraries (all host the same package). More details about installation of each library can be found in the documentation of the corresponding library.
- cuBLASDx
- cuFFTDx
- cuSolverDx
- cuRANDDx
- nvCOMPDx
Dx Library In Your Project#
Most of the device extension libraries are header-only libraries, to use those simply add MathDx include directory in compilation commands. It’s also necessary to add paths to the dependencies, CUTLASS and commonDx, which are shipped with the MathDx package.
Library |
Header-only |
Header |
LTO Library |
|---|---|---|---|
cuBLASDx |
Yes |
cublasdx.hpp |
No |
cuFFTDx |
Yes |
cufftdx.hpp |
No (See note below) |
cuSolverDx |
No |
cusolverdx.hpp |
libcusolverdx.a
libcusolverdx.fatbin
|
cuRANDDx |
Yes |
curanddx.hpp |
No |
nvCOMPDx |
No |
nvcompdx.hpp |
libnvcompdx.a
libnvcompdx.fatbin
|
Note
Starting from version 1.6.0, cuFFTDx provides an experimental feature that extends functionality and improves performance by reusing optimized code from the cuFFT library. See the cuFFTDx documentation for more details.
After unpacking the MathDx YY.MM package into <your_directory>, the Dx header files are available at the following location:
<your_directory>/nvidia/mathdx/YY.MM/include/
The commonDx include directory is (same as for Dx headers):
<your_directory>/nvidia/mathdx/YY.MM/include
The CUTLASS include directory is:
<your_directory>/nvidia/mathdx/YY.MM/external/cutlass/include
Examples
The simplest way to use MathDx is to add all required include directories to the NVCC compilation command:
nvcc -std=c++17 -arch=sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary>
Linking LTO#
Libraries cuSolverDx and nvCOMPDx require linking to the corresponding LTO
(Link Time Optimization) libraries.
Both libraries provide their libraries in two forms: static library lib<library_name>.a and fatbin lib<library_name>.fatbin.
lib<library_name>.fatbin, in contrast to lib<library_name>.a,
contains only device code and thus is host platform agnostic.
For example, it can be safely used on the AARCH64 platform,
whereas lib<library_name>.a can only be used for x86_64 Linux builds.
Important
Fatbin file (lib<library_name>.fatbin) can only be used with NVCC 12.8 or newer. This limitation doesn’t apply to NVRTC and nvJitLink.
Important
When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile the code must be from a CUDA toolkit that is either the same version or older than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.
When compiling with NVCC it’s necessary to link to the corresponding LTO library file:
# When using lib<library_name>.a
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> -l<library_name>
# When using lib<library_name>.fatbin
nvcc -dlto -std=c++17 -arch sm_XY (...) -I<mathdx_include_dir> -I<cutlass_include_dir> <your_source_file>.cu -o <your_binary> lib<library_name>.fatbin
The -dlto option at link time instructs the linker to retrieve the LTO IR from the library object and perform optimization
on the resulting IR for code generation.
When using NVRTC and nvJitLink for runtime kernel compilation and linking it’s possible to use either
fatbin file lib<library_name>.fatbin or lib<library_name>.a.
Fatbin has to be used for platforms other than x86_64 Linux.
Dx Library In Your CMake Project#
MathDx package provides a configuration file that simplifies using Dx libraries in other CMake projects.
After finding mathdx using find_package,
users have to link mathdx::<library_name> to their target (see MathDx CMake targets).
This propagates the include directory <library_name>_INCLUDE_DIRS, required dependencies,
and the C++17 requirement to their target.
For example, linking to cuBLASDx and cuRANDDx to YourProgram CMake target can be done as follows:
find_package(mathdx REQUIRED COMPONENTS cublasdx curanddx CONFIG)
target_link_libraries(YourProgram mathdx::cublasdx mathdx::curanddx)
You can pass the path to MathDx package using PATHS option:
find_package(mathdx REQUIRED COMPONENTS <libraries> CONFIG PATHS "<your_directory>/nvidia/mathdx/yy.mm/")
Alternatively, you can set mathdx_ROOT during cmake configuration of your project:
cmake -Dmathdx_ROOT="<your_directory>/nvidia/mathdx/yy.mm/" (...)
Linking LTO#
To enable LTO in CMake for a target, set INTERPROCEDURAL_OPTIMIZATION to true,
and to allow separate compilation of device code set CUDA_SEPARABLE_COMPILATION to true.
It is required when linking device extension libraries that use LTO, both for targets
relying on static library lib<library_name>.a and for targets relying on fatbin library lib<library_name>.fatbin.
When CMake detects NVCC CUDA compiler 12.8 or newer, MathDx exposes mathdx::<library_name>_fatbin targets.
They can be used instead of mathdx::<library_name>. They rely on fatbin library lib<library_name>.fatbin.
lib<library_name>.fatbin, in contrast to lib<library_name>.a,
contains only device code and thus is host platform agnostic.
For example, it can be safely used on the AARCH64 platform,
whereas lib<library_name>.a can only be used for x86_64 Linux builds.
Important
mathdx::<library_name>_fatbin is only available when CMake detects that NVCC CUDA compiler 12.8 or newer is used.
You can check NVCC version in CMake script using CMAKE_CUDA_COMPILER_VERSION variable.
Important
When using LTO libraries, please observe the following requirements: * The NVCC / NVRTC used to compile the code must be from a CUDA toolkit that is either the same version or older than the NVCC / nvJitLink used to perform the linking stage. * Both the compiler and the linker must be from the same CUDA toolkit major version. * MathDx provides a package for every supported CUDA toolkit major version.
For example, linking to cuSolverDx to YourProgram CMake target can be done as follows:
# find cuSolverDx
find_package(mathdx REQUIRED COMPONENTS cusolverdx CONFIG)
# enable LTO in your target
set_target_properties(YourProgram
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON
INTERPROCEDURAL_OPTIMIZATION ON)
# link against mathdx::cusolverdx
target_link_libraries(YourProgram mathdx::cusolverdx)
# or, alternatively, against mathdx::cusolverdx_fatbin
target_link_libraries(YourProgram mathdx::cusolverdx_fatbin)
Targets#
For all details about the targets, please refer to the documentation of the corresponding library.
Library |
CMake Target(s) |
|---|---|
cuBLASDx |
|
cuFFTDx |
mathdx::cufftdxmathdx::cufftdx_separate_twiddles_lut |
cuRANDDx |
|
cuSolverDx |
mathdx::cusolverdxmathdx::cusolverdx_fatbin (available only for CTK 12.8+) |
nvCOMPDx |
mathdx::nvcompdxmathdx::nvcompdx_fatbin (available only for CTK 12.8+) |
Defined Variables#
For all details about the defined variables, please refer to the documentation of the corresponding library.
mathdx_FOUNDTrue if MathDx was found.
mathdx_VERSIONMathDx package version number in
X.Y.Zformat.mathdx_INCLUDE_DIRSMathDx include directories.
mathdx_<library_name>_FOUND,<library_name>_FOUNDTrue if
<library_name>was found. Example:mathdx_cublasdx_FOUND,cublasdx_FOUND.<library_name>_VERSION<library_name>version number inX.Y.Zformat. Example:cublasdx_VERSION.<library_name>_INCLUDE_DIRS<library_name>include directories. Example:cublasdx_INCLUDE_DIRS.
Additionally, the following variables are defined for libraries with LTO:
<library_name>_LIBRARIES<library_name>library target, which uses static librarylib<library_name>.a. Example:cusolverdx_LIBRARIES.<library_name>_FATBINPath to
<library_name>fatbin librarylib<library_name>.fatbin. Example:cusolverdx_FATBIN.