Cholesky Factorization For a Large Matrix Using Blocked Algorithm#
blocked_potrf.cu
This advanced example presents an implementation of Cholesky factorization using blocked algorithm for large matrices too large to fit in the shared memory, or too slow to directly use cuSolverDx’s unblocked Cholesky API because of the high register and shared memory usage.
The code uses a left-looking blocked algorithm with an out-of-core implementation, using a single thread block to process each batch of the matrix A. The factorization of the N x N matrix A proceeds in N / NB steps of sub-matrices of size NB x NB, each step including a sequence of calls to the unblocked Cholesky factorization, triangular solver (TRSM), and cuBlasDx’s GEMM. The results are compared with these obtained using cuSolver host API cusolverDnXportf.