Release Notes¶
cuDSS v0.3.0¶
New Features:
Multi-GPU multi-node (MGMN) mode with prebuilt standalone communication layers for NCCL and OpenMPI, as well as with custom user-defined GPU-aware communication backends
Hybrid host/device memory mode which enables keeping the factor values in the host memory (RAM) and uses only a smaller device buffer as a temporary
Extended support to
Linux ARM(aarch64)
(Ubuntu 22.04, only on Orin (SM 8.7
) devices)
Breaking changes:
Removed values
CUDSS_STATUS_ARCH_MISMATCH
andCUDSS_STATUS_ZERO_PIVOT
from the enumcudssStatus_t
as these values will not be usedRenamed the main header
cuDSS.h
ascudss.h
to better align with other CUDA math libraries
Important bug fixes:
Fixed execution failures when multiple tiny matrices (less than 16x16) were solved re-using the same
cudssData_t
Fixed incorrect result when factorization phase followed a re-factorization phase with
CUDSS_ALG_1
andCUDSS_ALG_2
reordering algorithms
Known issue:
Error messages are seen during cuDSS installation on RPM-based systems (RHEL, Rockey, SLES)
failed to link /usr/lib/#INSTALL_TRIPLET#/libcudss_commlayer_nccl.so -> /etc/alternatives/libcudss_commlayer_nccl.so: No such file or directory
failed to link /usr/lib/#INSTALL_TRIPLET#/libcudss_commlayer_openmpi.so -> /etc/alternatives/libcudss_commlayer_openmpi.so: No such file or directory
The installation completes despite the failure to create a couple of symlinks for cuDSS.
To fix the issue, please apply the workaround ONLY after encountering the above issue. The issue and thus workaround is only applicable to RPM-based systems (RHEL, Rockey, SLES). The workaround will drop and recreate all symlinks intended for the cudss alternatives system.
update-alternatives --remove cudss /usr/lib64/libcudss/12/libcudss.so.0
/sbin/ldconfig
update-alternatives --install /usr/lib64/libcudss.so.0 cudss /usr/lib64/libcudss/12/libcudss.so.0 120 \
--slave /usr/lib64/libcudss.so libcudss.so /usr/lib64/libcudss/12/libcudss.so \
--slave /usr/lib64/libcudss_commlayer_nccl.so libcudss_commlayer_nccl.so /usr/lib64/libcudss/12/libcudss_commlayer_nccl.so \
--slave /usr/lib64/libcudss_commlayer_openmpi.so libcudss_commlayer_openmpi.so /usr/lib64/libcudss/12/libcudss_commlayer_openmpi.so \
--slave /usr/lib64/libcudss_static.a libcudss_static.a /usr/lib64/libcudss/12/libcudss_static.a \
--slave /usr/lib64/cmake/cudss cudss_cmake /usr/lib64/libcudss/12/cmake/cudss \
--slave /usr/include/cudss.h cudss.h /usr/include/libcudss/12/cudss.h \
--slave /usr/include/cudss_distributed_interface.h cudss_distributed_interface.h /usr/include/libcudss/12/cudss_distributed_interface.h
/sbin/ldconfig
After the steps are completed, confirm that all the symlinks exist. Expectation:
# ls -l /usr/lib64/cmake/cudss
... /usr/lib64/cmake/cudss -> /etc/alternatives/cudss_cmake
# ls -l /usr/include/cudss*
... /usr/include/cudss.h -> /etc/alternatives/cudss.h
... /usr/include/cudss_distributed_interface.h -> /etc/alternatives/cudss_distributed_interface.h
# ls -l /usr/lib64/*cudss*
... /usr/lib64/libcudss.so -> /etc/alternatives/libcudss.so
... /usr/lib64/libcudss.so.0 -> /etc/alternatives/cudss
... /usr/lib64/libcudss_commlayer_nccl.so -> /etc/alternatives/libcudss_commlayer_nccl.so
... /usr/lib64/libcudss_commlayer_openmpi.so -> /etc/alternatives/libcudss_commlayer_openmpi.so
... /usr/lib64/libcudss_static.a -> /etc/alternatives/libcudss_static.a
# ls -l /etc/alternatives/*cudss*
... /etc/alternatives/cudss -> /usr/lib64/libcudss/12/libcudss.so.0
... /etc/alternatives/cudss.h -> /usr/include/libcudss/12/cudss.h
... /etc/alternatives/cudss_cmake -> /usr/lib64/libcudss/12/cmake/cudss
... /etc/alternatives/cudss_distributed_interface.h -> /usr/include/libcudss/12/cudss_distributed_interface.h
... /etc/alternatives/libcudss.so -> /usr/lib64/libcudss/12/libcudss.so
... /etc/alternatives/libcudss_commlayer_nccl.so -> /usr/lib64/libcudss/12/libcudss_commlayer_nccl.so
... /etc/alternatives/libcudss_commlayer_openmpi.so -> /usr/lib64/libcudss/12/libcudss_commlayer_openmpi.so
... /etc/alternatives/libcudss_static.a -> /usr/lib64/libcudss/12/libcudss_static.a
cuDSS v0.2.1¶
Important bug fixes:
Fixed host memory leaks
Fixed device memory bookkeeping which could cause read violation errors and segmentation faults when cuDSS is called repeatedly with the same cudssHandle_t and cudssData_t objects
Fixed incorrect results of iterative refinement for 1-based input matrices
Fixed incorrect internal temporary buffer size which could cause invalid memory accesses for small matrices
cuDSS v0.2.0¶
New Features:
Performance improvements for non-symmetric and non-hermitian matrices for the reordering algorithm
CUDSS_ALG_1
Support for user-defined device memory allocators/memory pools
Support for extracting permutations which account for both reordering and pivoting (via new values
CUDSS_DATA_PERM_ROW
andCUDSS_DATA_PERM_COL
in the enumcudssDataParam_t
) for reordering algorithmsCUDSS_ALG_1
andCUDSS_ALG_2
Extended support to all SM architectures starting with Pascal (
SM 5.0
)Extended support to
Linux ARM(SBSA)
(Ubuntu 20.04, Ubuntu 22.04, RHEL 8, RHEL 9, SLES 15)
Breaking changes:
Replaced value
CUDSS_DATA_PERM_REORDER
in the enumcudssDataParam_t
withCUDSS_DATA_PERM_REORDER_ROW
andCUDSS_DATA_PERM_REORDER_COL
to separate row and column reordering permutations which can be different for non-symmetric reordering algorithmCUDSS_ALG_1
Important bug fixes:
Fixed incorrect solution for Hermitian matrices with non-disabled pivoting
Fixed sporadically incorrect solution on H100 due to shared memory allocation size
Fixed incorrect propagation of pivoting tolerance and epsilon from
cudssConfig_t
duringcudssExecute()
Fixed sporadic hangs on GPUs with small number of SMs
cuDSS v0.1.0¶
New Features:
Initial release
Support for single GPU, SM architectures:
SM 7.0
and newerSupport
Linux x86-64
(Ubuntu 20.04, Ubuntu 22.04, RHEL 8, RHEL 9, SLES 15)Support
Windows x86-64
(Windows 10, 11)Support for single/double real/complex datatype for values and int datatype for indices
Synchronous API
Compatibility notes:
cuDSS requires CUDA 12.0 or above