Release Notes¶
This section includes significant changes, new features, performance improvements, and various issues. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).
0.1.0¶
The first early access (EA) release of cuBLASDx library.
New Features¶
Support for general matrix multiply.
Tensor cores support for fp16, fp64, complex fp64 calculations.
Support for SM70 - SM90 CUDA architectures.
Multiple examples included.
Known Issues¶
Since CUDA Toolkit 12.2 NVCC compiler in certain situation reports incorrect compilation error when
value_type
type of GEMM description type is used. The problematic code with possible workarounds are presented below:// Any GEMM description type using GEMM = decltype(Size<32 /* M */, 32 /* N */, 32 /* K */>() + Precision<double>() + Type<type::real>() + TransposeMode<t_mode /* A */, t_mode /* B */>() + Function<function::MM>() + SM<700>() + Block()); using type = typename GEMM::value_type; // compilation error // Workaround #1 using type = typename decltype(GEMM())::value_type; // Workaround #2 (used in cuBLASDx examples) template <typename T> using value_type_t = typename T::value_type; using type = value_type_t<GEMM>;