Release Notes#
cuSOLVERMp v0.7.0#
Released: August 12, 2025
Breaking changes#
- cuSOLVERMp has transitioned from using the Communication Abstraction Library (libcal) to using NCCL directly. This is a breaking change and requires changes to cuSOLVERMp initialization in the user application.
See Migrating from CAL to NCCL for steps to transition the application from libcal to NCCL.
The steps to initialize cuSOLVERMp with NCCL are described in NCCL Initialization.
libcal documentation page is still available at CAL Initialization (Legacy) but it is only applicable to cuSOLVERMp versions older than 0.7.0.
New features#
Added support for CUDA 13 on devices with Compute Capability 8.0 (Ampere) and above.
- cuSOLVERMp leverages techniques for floating point emulation as described in cuBLAS Floating Point Emulation for improved performance (CUDA 13+ and Compute Capability 10+).
Introduced new APIs: cusolverMpSetMathMode(), cusolverMpGetMathMode(), cusolverMpSetEmulationStrategy(), and cusolverMpGetEmulationStrategy().
FP32 emulation can be enabled by setting the math mode to
CUSOLVER_FP32_EMULATED_BF16X9_MATH
(see cusolverMpSetMathMode()).The emulation strategy can be further tuned using the cusolverMpSetEmulationStrategy() API.
The math mode and emulation strategy are propagated to the internal cuBLAS and cuSOLVER handles. The workspace sizes returned by
*_bufferSize
APIs may depend on the math mode.The defaults are not changed from the previous version, i.e., emulation is disabled and math mode is set to
CUSOLVER_DEFAULT_MATH
.
Bugfixes#
Fixed a bug that could cause cusolverMpGeqrf() to fail on non-square process grids.
Fixed a bug that could cause cusolverMpGetrf()/cusolverMpGetrs() to fail on non-square process grids.
Fixed a bug that could cause cuSOLVERMp to crash when logging is enabled.
cuSOLVERMp v0.6.0#
Released: February 13, 2025
Added support for NVIDIA Blackwell GPU architecture.
Dropped support for CUDA 11.x.
cuSOLVERMp v0.5.1#
Released: August 22, 2024
Fixed a bug in cusolverMpSyevd() where the eigenvalues were not broadcasted to all the processes if the problem fit on a single process.
Known Issues#
Stream passed to cusolverMpCreate() cannot be the default (
NULL
or0
) stream (bug 4337214).
cuSOLVERMp v0.5.0#
Released: May 2, 2024
Improved the performance of cusolverMpStedc().
Introduced a new option to force NCCL usage by setting the
CUSOLVERMP_FORCE_NCCL=1
environment flag. This is only applicable in parts of the eigensolver for now.
cuSOLVERMp v0.4.3#
Released: February 5, 2024
Supported CUDA 12.1.1.
Fixed a bug that processors are hanging when a problem is tiny and fits in a single processor.
Known Issues#
CUDA 12.1.1 is compatible with NCCL up to v2.16.x; higher NCCL version may hang intermittently for certain processor grids.
cuSOLVERMp v0.4.2#
Released: HPC SDK 23.11
Fixed a bug in cusolverMpSyevd() that the code returns an internal error for a matrix filled with zero entries; the correct behavior is to return zero eigenvalues and unit eigenvectors.
Supported CUDA 12.1.1
Note that the code is compatible with NCCL up to v2.16.x
cuSOLVERMp v0.4.1#
Released: HPC SDK 23.7
Added support for row major grid in SYEVD.
cuSOLVERMp v0.4.0#
Released: HPC SDK 23.5
Added routines for symmetric (Hermitian) generalized eigen solver
cusolverMpSygst() reduces the symmetric (Hermitian) generalized eigen problem to standard form.
cusolverMpSygvd() computes all eigenvalues and eigenvectors of symmetric (Hermitian) generalized eigen problem.
cuSOLVERMp v0.3.1#
Released: HPC SDK 23.3
Minor bugfixes are included
cusolverMpPotrf() fixes to result cleans zeros of the imaginary part of diagonals.
cusolverMpStedc() fixes internal memory leak.
cuSOLVERMp v0.3.0#
Released: HPC SDK 23.1
Removed dependency on MPI, now UCC library is the main communication backend
Provide the following computational APIs:
cusolverMpGeqrf_bufferSize(), cusolverMpGeqrf(), cusolverMpOrmqr_bufferSize(), cusolverMpOrmqr(), cusolverMpGels_bufferSize(), cusolverMpGels(), cusolverMpSytrd_bufferSize(), cusolverMpSytrd(), cusolverMpStedc_bufferSize(), cusolverMpStedc(), cusolverMpOrmtr_bufferSize(), cusolverMpOrmtr(), cusolverMpSyevd_bufferSize(), cusolverMpSyevd().
Note that cusolverMpGels() currently supports least square solutions with no-transpose option only.
Note that cusolverMpSytrd(), cusolverMpOrmtr() and cusolverMpSyevd() currently support a lower triangular input matrix only.
cuSOLVERMp v0.2.0#
Released: HPC SDK 22.05
Added support for
pp64 + SpectrumMPI
, targeting ORNL’s Summit Supercomputer.Added Cholesky factorization and solve APIs:
Note that cusolverMpGetrs() does not offer support for multiple right-hand sides at this point.
cuSOLVERMp v0.1.0#
Released: HPC SDK 21.11
Initial release.
Support
Linux x86_64
andCompute Capability 8.0
.Provide the following computational APIs:
Note that cusolverMpGetrs() does not offer support for multiple right-hand sides at this point.