Solve General Tridiagonal System of Equations#
GTSV_NO_PIVOT (General Tridiagonal system SolVe) function solves a system of linear equations for general tridiagonal matrices:
where:
Ais the input batchedN x Ntridiagonal matrix, defined with three vectorsdl,d, anddu, corresponding to the subdiagonal, diagonal, and superdiagonal elements, respectively:
Xis the output batchedN x Ksolution matrix, andBis the input batchedN x Kright-hand side matrix.
cuSolverDx gtsv_no_pivot function does not perform any pivoting and uses a combination of the Parallel Cyclic Reduction (PCR) and Thomas algorithms to solve the tridiagonal system of equations. The Thomas algorithm, based on LU decomposition, is efficient to solve tridiagonal systems but inherantly sequential, while the PCR algorithm is more computationally expensive but parallelizable. The hybrid combination of the two algorithms is used to achieve the best performance for different problem sizes.
cuSolverDx gtsv_no_pivot device functions (see Execution Methods):
__device__ void execute(const data_type* dl, const data_type* d, const data_type* du, data_type* b, status_type* info);
// with the runtime leading dimension
__device__ void execute(const data_type* dl, const data_type* d, const data_type* du, data_type* b, unsigned int ldb, status_type* info);
dl is a batched N-1 array containing the subdiagonal elements of A, d is a batched N array containing the diagonal elements, and du is a batched N-1 array containing the superdiagonal elements.
Note
dl, d, and du are unmodified by the functions. This is different from the Lapack gtsv functions (see LAPACK gtsv documentation), which modifies the three arrays to store the factors of the matrix A after the LU factorization.
B is a batched dense N x K right-hand side matrix, the leading dimension of B is ldb >= N if B is in column-major layout, or ldb >= K if B is row-major.
The output matrix X overwrites B with the same arrangement and leading dimension.
The output status parameter, info, is an array of batch size. If LU factorization is successful, every element of info is zero.
If LU factorization failed for any batches, i.e. there was a division by zero, info[batch_id] = i indicates the i-1-th row produced a zero diagonal element.
The function supports:
Bbeing either column- or row-major memory layout, see Arrangement operator.
Note
Because matrix A in the functions is implicitly defined with three vectors, dl, d, and du, the leading dimension, or arrangment of A, if defined, is ignored.
Note
The functions do not support transposed matrix A via TransposeMode operator, however, the equation A^T * X = B can be solved by interchanging the order of the arguments dl and du.