NVIDIA cuSolverDx Documentation#
The cuSolver Device Extensions (cuSolverDx) library enables selected dense matrix factorization and solve routines from cuSolver, to be conducted inside CUDA kernels. Fusing these routines with other operations can decrease latency and improve overall performance of your application.
cuSolverDx is a part of the MathDx package which also includes cuBLASDx for basic linear algebra subroutines (BLAS), cuFFTDx for FFT calculations, and cuRANDDx for random number generations. All the device extension libraries are designed to work together. When using multiple device extensions libraries in a single project they should all come from the same MathDx release.
Highlights#
The cuSolverDx library currently provides:
High performance batched Cholesky and LU factorization (without pivoting) and solve functions that can be embeddable into a CUDA kernel.
Customizable options to compose these functions for different use cases, including size, precision, type, fill mode, storage layout, and targeted CUDA architecture, etc.
Ability to fuse cuSolverDx functions with other operations in order to save global memory trips.