Release Notes

This section includes significant changes, new features, performance improvements, and various issues. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).

0.1.0

The first early access (EA) release of cuBLASDx library.

New Features

  • Support for general matrix multiply.

    • Tensor cores support for fp16, fp64, complex fp64 calculations.

  • Support for SM70 - SM90 CUDA architectures.

  • Multiple examples included.

Known Issues

  • Since CUDA Toolkit 12.2 NVCC compiler in certain situation reports incorrect compilation error when value_type type of GEMM description type is used. The problematic code with possible workarounds are presented below:

    // Any GEMM description type
    using GEMM = decltype(Size<32 /* M */, 32 /* N */, 32 /* K */>()
                + Precision<double>()
                + Type<type::real>()
                + TransposeMode<t_mode /* A */, t_mode /* B */>()
                + Function<function::MM>()
                + SM<700>()
                + Block());
    using type = typename GEMM::value_type; // compilation error
    
    // Workaround #1
    using type = typename decltype(GEMM())::value_type;
    
    // Workaround #2 (used in cuBLASDx examples)
    template <typename T>
    using value_type_t = typename T::value_type;
    
    using type = value_type_t<GEMM>;