Functionality#

The cuSolverDx library provides a comprehensive set of dense matrix operations that can be executed directly within CUDA kernels. These operations are designed to be highly efficient and customizable, allowing for optimal performance in various computational scenarios.

Key Features#

Matrix Factorizations:
- Cholesky decomposition (POTRF)
- LU decomposition with and without partial pivoting (GETRF)
- QR and LQ factorizations (GEQRF, GELQF)
Linear System Solvers:
- Triangular system solves (TRSM)
- Linear system solves using Cholesky factors (POTRS)
- Linear system solves using LU factors (GETRS)
- Least squares problems (GELS)
Matrix Operations:
- Multiplication with Q from QR factorization (UNMQR)
- Multiplication with Q from LQ factorization (UNMLQ)
Performance Optimizations:
- Support for both real and complex data types
- Configurable matrix storage layouts (column-major and row-major)
- Architecture-specific optimizations
- Batched operation support for improved throughput
Integration Capabilities:
- Seamless integration with other CUDA operations
- Support for shared memory usage
- Customizable block dimensions and batches per block

For detailed information about each operation, please refer to the corresponding sections in the API Reference.