Release Notes#

cuBLASMp v0.7.0#

Released: November 24, 2025

New features#

  • Added FP32 emulation support.

  • Added FP4 output support.

  • Added CUBLASMP_MATMUL_ALGO_TYPE_NO_OVERLAP algorithm type for cublasMpMatmul API. These algorithms may perform better on non-NVLINK multi-node systems.

  • Added CUBLASMP_DISABLE_NVSHMEM environment variable to disable NVSHMEM usage at runtime. When set to 1, cuBLASMp will not initialize or use NVSHMEM, which can be useful for debugging or on platforms where NVSHMEM is not supported.

Bug fixes#

  • Fixed epilogue related issues in AllGather+GEMM and GEMM+ReduceScatter.

  • Fixed a bug that prevented NVSHMEM from being finalized.

cuBLASMp v0.6.0#

Released: September 9, 2025

Bug fixes#

  • Fixed redistribution functions (cublasMpGemr2D, cublasMpTrmr2D) to work with unified memory.

  • Fixed Matmul when C matrix is nullptr and D matrix type is BF16 or FP32.

  • Fixed a bug in multicast-based algorithms when used with communicator different from the one used for initialization.

cuBLASMp v0.5.1#

Released: August 11, 2025

  • Added support for CUDA 13 on devices with Compute Capability 8.0 (Ampere) and above.

cuBLASMp v0.5.0#

Released: June 16, 2025

Breaking changes#

  • cuBLASMp has transitioned from using the Communication Abstraction Library (libcal) to using NCCL directly. This is a breaking change and requires changes to cuBLASMp initialization in the user application.

New features#

cuBLASMp v0.4.0#

Released: March 10, 2025

cuBLASMp v0.3.1#

Released: December 10, 2024

  • Add option to set the amount of SMs to be used for communication (currently relevant only for Atomic GEMM + ReduceScatter).

  • Decrease workspace size requirement in TP overlap GEMMs.

  • Remove extra synchronization in TP overlap GEMMs.

  • Allow C matrix to be null when beta is 0.

  • Fix GEMM implementation for complex types with transA / transB being CUBLAS_OP_T.

cuBLASMp v0.3.0#

Released: November 4, 2024

Breaking changes#

cuBLASMp v0.2.1#

Released: May 29, 2024

  • Added mixed and lower precision support.

  • Bug fixes.

cuBLASMp v0.2.0#

Released: April 4, 2024

cuBLASMp v0.1.2#

Released: February 22, 2024

cuBLASMp v0.1.1#

Released: January 11, 2024

cuBLASMp v0.1.0#

Released: December 11, 2023

  • Early access release.

  • This release focuses on functionality.