Release Notes#

cuSOLVERMp v0.6.0#

Added support for NVIDIA Blackwell GPU architecture.
Dropped support for CUDA 11.x.

cuSOLVERMp v0.5.1#

Fixed a bug in cusolverMpSyevd() where the eigenvalues were not broadcasted to all the processes if the problem fit on a single process.

Known Issues#

Stream passed to cusolverMpCreate() cannot be the default (NULL or 0) stream (bug 4337214).

cuSOLVERMp v0.5.0#

Improved the performance of cusolverMpStedc().
Introduced a new option to force NCCL usage by setting the CUSOLVERMP_FORCE_NCCL=1 environment flag. This is only applicable in parts of the eigensolver for now.

cuSOLVERMp v0.4.3#

Supported CUDA 12.1.1.
Fixed a bug that processors are hanging when a problem is tiny and fits in a single processor.

Known Issues#

CUDA 12.1.1 is compatible with NCCL up to v2.16.x; higher NCCL version may hang intermittently for certain processor grids.

cuSOLVERMp v0.4.2#

Fixed a bug in cusolverMpSyevd() that the code returns an internal error for a matrix filled with zero entries; the correct behavior is to return zero eigenvalues and unit eigenvectors.
Supported CUDA 12.1.1
Note that the code is compatible with NCCL up to v2.16.x

cuSOLVERMp v0.4.1#

Added support for row major grid in SYEVD.

cuSOLVERMp v0.4.0#

Released with HPC-SDK 23.5.
Added routines for symmetric (Hermitian) generalized eigen solver
- cusolverMpSygst() reduces the symmetric (Hermitian) generalized eigen problem to standard form.
- cusolverMpSygvd() computes all eigenvalues and eigenvectors of symmetric (Hermitian) generalized eigen problem.

cuSOLVERMp v0.3.1#

Released with HPC-SDK 23.3.
Minor bugfixes are included
- cusolverMpPotrf() fixes to result cleans zeros of the imaginary part of diagonals.
- cusolverMpStedc() fixes internal memory leak.

cuSOLVERMp v0.3.0#

Released with HPC-SDK 23.1.
Removed dependency on MPI, now UCC library is the main communication backend
Provide the following computational APIs:
- cusolverMpGeqrf_bufferSize(), cusolverMpGeqrf(), cusolverMpOrmqr_bufferSize(), cusolverMpOrmqr(), cusolverMpGels_bufferSize(), cusolverMpGels(), cusolverMpSytrd_bufferSize(), cusolverMpSytrd(), cusolverMpStedc_bufferSize(), cusolverMpStedc(), cusolverMpOrmtr_bufferSize(), cusolverMpOrmtr(), cusolverMpSyevd_bufferSize(), cusolverMpSyevd().
Note that cusolverMpGels() currently supports least square solutions with no-transpose option only.
Note that cusolverMpSytrd(), cusolverMpOrmtr() and cusolverMpSyevd() currently support a lower triangular input matrix only.

cuSOLVERMp v0.2.0#

Released with HPC-SDK 21.09.
Added support for pp64 + SpectrumMPI, targeting ORNL’s Summit Supercomputer.
Added Cholesky factorization and solve APIs:
- cusolverMpPotrf_bufferSize(), cusolverMpPotrf(), cusolverMpPotrs_bufferSize(), cusolverMpPotrs().
Note that cusolverMpGetrs() does not offer support for multiple right-hand sides at this point.

cuSOLVERMp v0.1.0#

Initial release with HPC-SDK 20.09.
Support Linux x86_64 and Compute Capability 8.0.
Provide the following computational APIs:
- cusolverMpGetrf_bufferSize(), cusolverMpGetrf(), cusolverMpGetrs_bufferSize(), cusolverMpGetrs().
Note that cusolverMpGetrs() does not offer support for multiple right-hand sides at this point.