Release Notes#
This section lists significant changes, new features, performance improvements, and various issues for each release of cuSolverDx. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).
0.2.0#
New Features#
Support for batched or non-batched LU decomposition with partial pivoting.
Support for batched or non-batched linear system solves with multiple right-hand sides using LU factors with partial pivoting.
Support for batched or non-batched QR factorization.
Support for batched or non-batched LQ factorization.
Support for batched or non-batched Multiplication of Q from QR factorization.
Support for batched or non-batched Multiplication of Q from LQ factorization.
Support for batched or non-batched Least squares solves.
Support for batched or non-batched Triangular solves.
Support for Blackwell architectures
sm_100
,sm_101
,sm_120
; experimental support forsm_103
,sm_121
.Deprecation of support for NVIDIA Xavier Tegra SoC (
SM<720>
orsm_72
).
Breaking Changes#
cuSolverDx 0.2.0 requires CUDA Toolkit 12.6.3 or newer.
cuSolverDx 0.2.0 requires FillMode operator to be specified explicitly for Cholesky functions (potrf, potrs, posv), instead of using the default value if not specified.
Known Issues#
CUDA 12.8.0 and 12.9.0 could miscompile kernels using
gesv_no_pivot
function with high register pressure when all of the following conditions are met:SM is set to 1200 (
SM120
), andType is set to
type::real
.
The code corruptions may manifest as intermittent incorrect results.
To highlight this issue cuSolverDx 0.2.0 will overprotectively hard fail if these conditions are met.
If you are using cuSolverDx, and this happens to your code, you can:
Define the
CUSOLVERDX_IGNORE_NVBUG_5288270_ASSERT
macro to ignore these assertions and verify correctness of the resultsIf the case is indeed affected by the bug, adding the
-Xptxas -O0
flag to the compilation command will limit PTX optimization phase and produce correct binary, although potentially slower
0.1.0#
The first early access (EA) release of cuSolverDx library.
New Features#
Support for batched or non-batched Cholesky decomposition.
Support for batched or non-batched linear system solves with multiple right-hand sides using Cholesky factors.
Support for batched or non-batched LU decomposition without pivoting.
Support for batched or non-batched linear system solves with multiple right-hand sides using LU factors without pivoting.
Support for SM70 - SM90 CUDA architectures.
Multiple examples included.
Known Issues#
For CUDA Toolkits older than 12.1 Update 1 when using NVRTC and nvJitLink to perform runtime compilation of kernels that use cuSolverDx it is required to use fatbin file
libcusolverdx.fatbin
when linking instead oflibcusolverdx.a
. Path to the fatbin library file is defined incusolverdx_FATBIN
cmake variable.NVCC compiler in CUDA Toolkit from 12.2 to 12.4 reports incorrect compilation error when
value_type
type of a solver description type is used. The affected code with workarounds are presented below. The issue is fixed in CTK 12.5 and newer.using Solver = decltype(Size<32>() + Precision<double>() + Type<type::complex>() + Function<function::potrf>() + SM<800>() + Block()); using type = typename Solver::value_type; // compilation error // Workaround #1 using type = typename decltype(Solver())::value_type; // Workaround #2 (used in cuSolverDx examples) template <typename T> using value_type_t = typename T::value_type; using type = value_type_t<Solver>;