cuBLASMp Data Types#

Data types#

`cublasMpHandle_t`#

The cublasMpHandle_t structure holds the cuBLASMp library context (device properties, system information, etc.).

The handle must be initialized and destroyed using cublasMpCreate() and cublasMpDestroy() functions respectively.

`cublasMpGrid_t`#

The cublasMpGrid_t structure holds information about the grid dimensions and stores the communicator associated to the grid of processes.

It must be initialized and destroyed using cublasMpGridCreate() and cublasMpGridDestroy() functions respectively.

`cublasMpMatrixDescriptor_t`#

The cublasMpMatrixDescriptor_t structure captures the shape and characteristics of a distributed matrix.

It must be initialized and destroyed using cublasMpMatrixDescriptorCreate() and cublasMpMatrixDescriptorDestroy() functions respectively.

`cublasMpMatmulDescriptor_t`#

The cublasMpMatmulDescriptor_t structure captures the properties of a distributed matrix-matrix multiplication performed using cublasMpMatmul().

It must be initialized and destroyed using cublasMpMatmulDescriptorCreate() and cublasMpMatmulDescriptorDestroy() functions respectively.

Enumerators#

`cublasMpStatus_t`#

The type is used for function status returns. All cuBLASMp library functions return their status, which can have the following values.

Value	Meaning
CUBLASMP_STATUS_SUCCESS	The operation completed successfully.
CUBLASMP_STATUS_NOT_INITIALIZED	The cuBLASMp library was not initialized.
CUBLASMP_STATUS_ALLOCATION_FAILED	Resource allocation failed inside the cuBLASMp library.
CUBLASMP_STATUS_INVALID_VALUE	An unsupported value or parameter was passed to the function.
CUBLASMP_STATUS_ARCHITECTURE_MISMATCH	The function requires a feature absent from the device architecture.
CUBLASMP_STATUS_EXECUTION_FAILED	The GPU program failed to execute.
CUBLASMP_STATUS_INTERNAL_ERROR	An internal cuBLASMp operation failed.
CUBLASMP_STATUS_NOT_SUPPORTED	The functionality requested is not supported.

`cublasMpGridLayout_t`#

Describes the ordering of the grid of processes.

Value	Meaning
CUBLASMP_GRID_MAPPING_ROW_MAJOR	The grid of processes will be accessed in row-major ordering.
CUBLASMP_GRID_MAPPING_COL_MAJOR	The grid of processes will be accessed in column-major ordering.

`cublasMpLoggerCallback_t`#

Function pointer type for cuBLASMp logging callbacks. The callback function is called by the library to report logging information.

The callback can be set using cublasMpLoggerSetCallback().

The callback function takes the following parameters:

Parameter	Type	Description
logLevel	int	The severity level of the log message
functionName	const char*	Name of the function generating the log message
message	const char*	The actual log message text

`cublasMpMatmulDescriptorAttribute_t`#

Attributes of cublasMpMatmulDescriptor_t that can be set using cublasMpMatmulDescriptorAttributeSet() and queried using cublasMpMatmulDescriptorAttributeGet().

Value (`CUBLASMP_MATMUL_ DESCRIPTOR_ATTRIBUTE_*`)	Meaning	Type	Value (complete name)
TRANSA	Indicates which operation needs to be performed with the dense matrix A.	cublasOperation_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSA
TRANSB	Indicates which operation needs to be performed with the dense matrix B.	cublasOperation_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSB
COMPUTE_TYPE	Indicates the compute type of the matrix multiplication.	cublasComputeType_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_COMPUTE_TYPE
ALGO_TYPE	Hints the algorithm type to be used. If not supported, cuBLASMp will fallback to the default algorithm.	cublasMpMatmulAlgoType_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_ALGO_TYPE
COMMUNICATION_SM_COUNT	Indicates the number of SMs to be used for communication.	cublasMpMatmulAlgoType_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_COMMUNICATION_SM_COUNT
EPILOGUE	Specifies the epilogue operation to be performed after matrix multiplication (e.g., activation functions, bias addition).	cublasMpMatmulEpilogue_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE
BIAS_POINTER	Bias or bias gradient vector pointer in the device memory. Input vector with length that matches the number of rows of matrix D when one of the following epilogues is used: `CUBLASMP_MATMUL_EPILOGUE_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_RELU_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_RELU_AUX_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_GELU_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_GELU_AUX_BIAS`. Output vector with length that matches the number of rows of matrix D when one of the following epilogues is used: `CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD`, `CUBLASMP_MATMUL_EPILOGUE_DGELU_BGRAD`, `CUBLASMP_MATMUL_EPILOGUE_BGRADA`. Output vector with length that matches the number of columns of matrix D when one of the following epilogues is used: `CUBLASMP_MATMUL_EPILOGUE_BGRADB`. Bias vector elements are the same type as the datatype of matrix D.	`void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_POINTER
BIAS_BATCH_STRIDE	Stride in elements between bias values in batched operations. Currently, batched matmul is not supported and this parameter has to be to `0` (default value).	`int64_t`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_BATCH_STRIDE
BIAS_DATA_TYPE	Data type of the bias vector (e.g., `CUDA_R_16F`, `CUDA_R_32F`).	cudaDataType_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_DATA_TYPE
EPILOGUE_AUX_POINTER	Pointer for epilogue auxiliary buffer. If `CUBLASMP_MATMUL_EPILOGUE_RELU_AUX` or `CUBLASMP_MATMUL_EPILOGUE_RELU_AUX_BIAS` epilogue is used - output vector for ReLu bit-mask in forward pass; if `CUBLASMP_MATMUL_EPILOGUE_DRELU` or `CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD` epilogue is used - input vector for ReLu bit-mask in backward pass; if `CUBLASMP_MATMUL_EPILOGUE_GELU_AUX_BIAS` epilogue is used - output of GELU input matrix in forward pass; if `CUBLASMP_MATMUL_EPILOGUE_DGELU` or `CUBLASMP_MATMUL_EPILOGUE_DGELU_BGRAD` epilogue is used - input of GELU input matrix for backward pass. For the data type, see `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_DATA_TYPE`. Requires setting the `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_LD` attribute.	`void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER
EPILOGUE_AUX_LD	Leading dimension for epilogue auxiliary buffer. If `CUBLASMP_MATMUL_EPILOGUE_RELU_AUX`, `CUBLASMP_MATMUL_EPILOGUE_RELU_AUX_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD`, or `CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD` epilogue is used - ReLu bit-mask matrix leading dimension in elements (i.e. bits). Must be divisible by 128 and be no less than the number of rows in the output matrix. If `CUBLASMP_MATMUL_EPILOGUE_GELU_AUX_BIAS`, `CUBLASMP_MATMUL_EPILOGUE_DGELU`, or `CUBLASMP_MATMUL_EPILOGUE_DGELU_BGRAD` epilogue is used - GELU input matrix leading dimension in elements. Must be divisible by 8 and be no less than the number of rows in the output matrix.	`int64_t`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_LD
EPILOGUE_AUX_BATCH_STRIDE	Stride in bytes between auxiliary output buffers in batched epilogue operations. Currently, batched matmul is not supported and this parameter has to be to `0` (default value).	`int64_t`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_BATCH_STRIDE
EPILOGUE_AUX_DATA_TYPE	The type of the data that will be stored in `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER`. If unset, the data type is set to be the output matrix element data type (DType) with some exceptions: ReLu uses a bit-mask. For FP8 kernels with an output type (DType) of `CUDA_R_8F_E4M3`, the data type can be set to a non-default value if: AType and BType are `CUDA_R_8F_E4M3`, Bias Type is `CUDA_R_16F`, CType is `CUDA_R_16BF` or `CUDA_R_16F`, and `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE` is set to `CUBLASMP_MATMUL_EPILOGUE_GELU_AUX`. When CType is `CUDA_R_16F`, the data type may be set to `CUDA_R_16F` or `CUDA_R_8F_E4M3`. When CType is `CUDA_R_16BF`, the data type may be set to `CUDA_R_16BF`. Otherwise, the data type should be left unset.	cudaDataType_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_DATA_TYPE
EPILOGUE_AUX_SCALE_POINTER	Device pointer to the scaling factor value to convert results from compute type data range to storage data range in the auxiliary matrix that is set via `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER`. The scaling factor value must have the same type as the compute type. If not specified, the scaling factor is assumed to be 1.	`void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_SCALE_POINTER
EPILOGUE_AUX_AMAX_POINTER	Device pointer to the memory location that on completion will be set to the maximum of absolute values in the buffer that is set via `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER`. The computed value has the same type as the compute type. If not specified, the maximum absolute value is not computed.	`void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_AMAX_POINTER
EPILOGUE_AUX_SCALE_MODE	Scaling mode that defines how the matrix scaling factor for the auxiliary matrix is interpreted. Default value: `CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32`.	cublasMpMatmulMatrixScale_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_SCALE_MODE
A_SCALE_POINTER	Device pointer to the scale factor value that converts data in matrix A to the compute data type range. The scaling factor must have the same type as the compute type. Matrix scaling is supported only when compute type is `CUBLAS_COMPUTE_32F` and the output matrix type is not complex. If not specified, the scaling factor is assumed to be 1.	`const void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_POINTER
B_SCALE_POINTER	Equivalent to `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_POINTER` for matrix B.	`const void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_B_SCALE_POINTER
C_SCALE_POINTER	Equivalent to `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_POINTER` for matrix C.	`const void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_C_SCALE_POINTER
D_SCALE_POINTER	Equivalent to `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_POINTER` for matrix D.	`const void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_D_SCALE_POINTER
A_SCALE_MODE	Scaling mode that defines how the matrix scaling factor for matrix A is interpreted. Default value: `CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32`.	cublasMpMatmulMatrixScale_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_MODE
B_SCALE_MODE	Scaling mode that defines how the matrix scaling factor for matrix B is interpreted. Default value: `CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32`.	cublasMpMatmulMatrixScale_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_B_SCALE_MODE
C_SCALE_MODE	Scaling mode that defines how the matrix scaling factor for matrix C is interpreted. Default value: `CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32`.	cublasMpMatmulMatrixScale_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_C_SCALE_MODE
D_SCALE_MODE	Scaling mode that defines how the matrix scaling factor for matrix D is interpreted. Default value: `CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32`.	cublasMpMatmulMatrixScale_t	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_D_SCALE_MODE
AMAX_D_POINTER	Device pointer to the memory location that on completion will be set to the maximum of absolute values in the output matrix. The computed value has the same type as the compute type. If not specified, the maximum absolute value is not computed.	`void*`	CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_AMAX_D_POINTER

`cublasMpMatmulAlgoType_t`#

Matrix-matrix multiplication algorithms types to be used. This is treated as a hint and it is not guaranteed that cuBLASMp will use the requested implementation.

Value	Meaning
CUBLASMP_MATMUL_ALGO_TYPE_DEFAULT	Default algorithm.
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_P2P	Use split matmul with p2p communication.
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_MULTICAST	Use split matmul with multicast communication. Only valid for GEMM+ReduceScatter and GEMM+AllReduce on Compute Capability 9.0 (Hopper) and newer GPUs connected with an NVSwitch.
CUBLASMP_MATMUL_ALGO_TYPE_ATOMIC_MULTICAST	Use atomic matmul with multicast communication. Only valid for GEMM+ReduceScatter on Compute Capability 9.0 (Hopper) and newer GPUs connected with an NVSwitch. [DEPRECATED]

`cublasMpMatmulMatrixScale_t`#

An enumerated type used to specify the scaling mode that defines how scaling factor pointers are interpreted.

Value	Meaning
CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32	Scaling factors are single-precision scalars applied to the whole tensors.

`cublasMpMatmulEpilogue_t`#

An enumerated type used to specify the epilogue operation to be performed after matrix multiplication. Epilogues enable fusion of additional operations such as activation functions, bias addition, and their derivatives for training, providing better performance by avoiding separate kernel launches.

Value (`CUBLASMP_MATMUL_ EPILOGUE_*`)	Meaning	Supported Matmul algorithms	Value (complete name)
DEFAULT	No special postprocessing.	all	CUBLASMP_MATMUL_EPILOGUE_DEFAULT
ALLREDUCE	Performs AllReduce communication operation after matrix multiplication.	all	CUBLASMP_MATMUL_EPILOGUE_ALLREDUCE
RELU	Apply ReLU point-wise transform to the results (`x := max(x, 0)`).	all	CUBLASMP_MATMUL_EPILOGUE_RELU
RELU_AUX	Apply ReLU point-wise transform to the results (`x := max(x, 0)`). This epilogue mode produces an extra output, see `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	all	CUBLASMP_MATMUL_EPILOGUE_RELU_AUX
BIAS	Apply (broadcast) bias from the bias vector. Bias vector length must match matrix D rows, and it must be packed (such as stride between vector elements is 1). Bias vector is broadcast to all columns and added before applying the final postprocessing.	AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_BIAS
RELU_BIAS	Apply bias and then ReLU transform.	AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_RELU_BIAS
RELU_AUX_BIAS	Apply bias and then ReLU transform. This epilogue mode produces an extra output, see `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_RELU_AUX_BIAS
GELU	Apply GELU point-wise transform to the results (`x := GELU(x)`).	all	CUBLASMP_MATMUL_EPILOGUE_GELU
GELU_AUX	Apply GELU point-wise transform to the results (`x := GELU(x)`). This epilogue mode outputs GELU input as a separate matrix (useful for training). See `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	all	CUBLASMP_MATMUL_EPILOGUE_GELU_AUX
GELU_BIAS	Apply Bias and then GELU transform.	AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_GELU_BIAS
GELU_AUX_BIAS	Apply Bias and then GELU transform. This epilogue mode outputs GELU input as a separate matrix (useful for training). See `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_GELU_AUX_BIAS
DGELU	Apply GELU gradient to matmul output. Store GELU gradient in the output matrix. This epilogue mode requires an extra input, see `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	AllGather+GEMM, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_DGELU
DGELU_BGRAD	Apply independently GELU and Bias gradient to matmul output. Store GELU gradient in the output matrix, and Bias gradient in the bias buffer (see `CUBLASLT_MATMUL_DESC_BIAS_POINTER`). This epilogue mode requires an extra input, see `CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER` of cublasMpMatmulDescriptorAttribute_t.	AllGather+GEMM, GEMM+AllReduce	CUBLASMP_MATMUL_EPILOGUE_DGELU_BGRAD
BGRADA	Apply Bias gradient to the input matrix A. The bias size corresponds to the number of rows of the matrix D. The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer, see `CUBLASLT_MATMUL_DESC_BIAS_POINTER` of cublasMpMatmulDescriptorAttribute_t.	all	CUBLASMP_MATMUL_EPILOGUE_BGRADA
BGRADB	Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D. The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer, see `CUBLASLT_MATMUL_DESC_BIAS_POINTER` of cublasMpMatmulDescriptorAttribute_t.	all	CUBLASMP_MATMUL_EPILOGUE_BGRADB
DRELU	`DRELU` epilogue is currently not supported.	none	CUBLASMP_MATMUL_EPILOGUE_DRELU
DRELU_BGRAD	`DRELU_BGRAD` epilogue is currently not supported.	none	CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD

cuBLASMp Data Types#

Data types#

cublasMpHandle_t#

cublasMpGrid_t#

cublasMpMatrixDescriptor_t#

cublasMpMatmulDescriptor_t#

Enumerators#

cublasMpStatus_t#

cublasMpGridLayout_t#

cublasMpLoggerCallback_t#

cublasMpMatmulDescriptorAttribute_t#

cublasMpMatmulAlgoType_t#

cublasMpMatmulMatrixScale_t#

cublasMpMatmulEpilogue_t#

`cublasMpHandle_t`#

`cublasMpGrid_t`#

`cublasMpMatrixDescriptor_t`#

`cublasMpMatmulDescriptor_t`#

`cublasMpStatus_t`#

`cublasMpGridLayout_t`#

`cublasMpLoggerCallback_t`#

`cublasMpMatmulDescriptorAttribute_t`#

`cublasMpMatmulAlgoType_t`#

`cublasMpMatmulMatrixScale_t`#

`cublasMpMatmulEpilogue_t`#