Release Notes#

cuBLASMp v0.4.0#

cuBLASMp v0.3.1#

  • Add option to set the amount of SMs to be used for communication (currently relevant only for Atomic GEMM + ReduceScatter).

  • Decrease workspace size requirement in TP overlap GEMMs.

  • Remove extra synchronization in TP overlap GEMMs.

  • Allow C matrix to be null when beta is 0.

  • Fix GEMM implementation for complex types with transA / transB being CUBLAS_OP_T.

cuBLASMp v0.3.0#

Breaking changes#

cuBLASMp v0.2.1#

  • Added mixed and lower precision support.

  • Bug fixes.

cuBLASMp v0.2.0#

cuBLASMp v0.1.2#

cuBLASMp v0.1.1#

cuBLASMp v0.1.0#

  • Early access release.

  • This release focuses on functionality.