NVIDIA cuSolverDx Documentation#

The cuSolver Device Extensions (cuSolverDx) library enables selected dense matrix factorization and solve routines from cuSolver to be executed inside CUDA kernels. Fusing these operations with other computations can decrease latency and improve overall performance of your application.

cuSolverDx is a part of the MathDx package which also includes cuBLASDx for basic linear algebra subroutines (BLAS), cuFFTDx for FFT calculations, and cuRANDDx for random number generation. All the device extension libraries are designed to work together. When using multiple device extension libraries in a single project, they should all come from the same MathDx release.

Highlights#

The cuSolverDx library currently provides:

High performance batched Cholesky, LU (with and without partial pivoting), and QR factorization and solve functions that can be embeddable into a CUDA kernel.
Customizable options to compose these functions for different use cases, including size, precision, type, fill mode, storage layout, and targeted CUDA architecture, etc.
Ability to fuse cuSolverDx functions with other operations in order to save global memory trips.