cuBLASMp Data Types#
Data types#
cublasMpHandle_t
#
ThecublasMpHandle_t
structure holds the cuBLASMp library context (device properties, system information, etc.).The handle must be initialized and destroyed using cublasMpCreate() and cublasMpDestroy() functions respectively.
cublasMpGrid_t
#
ThecublasMpGrid_t
structure holds information about the grid dimensions and stores the communicator associated to the grid of processes.It must be initialized and destroyed using cublasMpGridCreate() and cublasMpGridDestroy() functions respectively.
cublasMpMatrixDescriptor_t
#
ThecublasMpMatrixDescriptor_t
structure captures the shape and characteristics of a distributed matrix.It must be initialized and destroyed using cublasMpMatrixDescriptorCreate() and cublasMpMatrixDescriptorDestroy() functions respectively.
cublasMpMatmulDescriptor_t
#
ThecublasMpMatmulDescriptor_t
structure captures the properties of a distributed matrix-matrix multiplication performed using cublasMpMatmul().It must be initialized and destroyed using cublasMpMatmulDescriptorCreate() and cublasMpMatmulDescriptorDestroy() functions respectively.
Enumerators#
cublasMpStatus_t
#
The type is used for function status returns. All cuBLASMp library functions return their status, which can have the following values.
Value |
Meaning |
---|---|
CUBLASMP_STATUS_SUCCESS |
The operation completed successfully. |
CUBLASMP_STATUS_NOT_INITIALIZED |
The cuBLASMp library was not initialized. |
CUBLASMP_STATUS_ALLOCATION_FAILED |
Resource allocation failed inside the cuBLASMp library. |
CUBLASMP_STATUS_INVALID_VALUE |
An unsupported value or parameter was passed to the function. |
CUBLASMP_STATUS_ARCHITECTURE_MISMATCH |
The function requires a feature absent from the device architecture. |
CUBLASMP_STATUS_EXECUTION_FAILED |
The GPU program failed to execute. |
CUBLASMP_STATUS_INTERNAL_ERROR |
An internal cuBLASMp operation failed. |
CUBLASMP_STATUS_NOT_SUPPORTED |
The functionality requested is not supported. |
cublasMpGridLayout_t
#
Describes the ordering of the grid of processes.
Value |
Meaning |
---|---|
CUBLASMP_GRID_MAPPING_ROW_MAJOR |
The grid of processes will be accessed in row-major ordering. |
CUBLASMP_GRID_MAPPING_COL_MAJOR |
The grid of processes will be accessed in column-major ordering. |
cublasMpLoggerCallback_t
#
Function pointer type for cuBLASMp logging callbacks. The callback function is called by the library to report logging information.The callback can be set using cublasMpLoggerSetCallback().The callback function takes the following parameters:
Parameter |
Type |
Description |
---|---|---|
logLevel |
int |
The severity level of the log message |
functionName |
const char* |
Name of the function generating the log message |
message |
const char* |
The actual log message text |
cublasMpMatmulDescriptorAttribute_t
#
Attributes ofcublasMpMatmulDescriptor_t
that can be set using cublasMpMatmulDescriptorAttributeSet() and queried using cublasMpMatmulDescriptorAttributeGet().
Value ( |
Meaning |
Type |
Value (complete name) |
---|---|---|---|
TRANSA |
Indicates which operation needs to be performed with the dense matrix A. |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSA |
|
TRANSB |
Indicates which operation needs to be performed with the dense matrix B. |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSB |
|
COMPUTE_TYPE |
Indicates the compute type of the matrix multiplication. |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_COMPUTE_TYPE |
|
ALGO_TYPE |
Hints the algorithm type to be used. If not supported, cuBLASMp will fallback to the default algorithm. |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_ALGO_TYPE |
|
COMMUNICATION_SM_COUNT |
Indicates the number of SMs to be used for communication. |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_COMMUNICATION_SM_COUNT |
|
EPILOGUE |
Specifies the epilogue operation to be performed after matrix multiplication (e.g., activation functions, bias addition). |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE |
|
BIAS_POINTER |
Bias or bias gradient vector pointer in the device memory. Input vector with length that matches the number of rows of matrix D when one of the following epilogues is used: |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_POINTER |
BIAS_BATCH_STRIDE |
Stride in elements between bias values in batched operations. Currently, batched matmul is not supported and this parameter has to be to |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_BATCH_STRIDE |
BIAS_DATA_TYPE |
Data type of the bias vector (e.g., |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_BIAS_DATA_TYPE |
|
EPILOGUE_AUX_POINTER |
Pointer for epilogue auxiliary buffer. If |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_POINTER |
EPILOGUE_AUX_LD |
Leading dimension for epilogue auxiliary buffer. If |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_LD |
EPILOGUE_AUX_BATCH_STRIDE |
Stride in bytes between auxiliary output buffers in batched epilogue operations. Currently, batched matmul is not supported and this parameter has to be to |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_BATCH_STRIDE |
EPILOGUE_AUX_DATA_TYPE |
The type of the data that will be stored in |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_DATA_TYPE |
|
EPILOGUE_AUX_SCALE_POINTER |
Device pointer to the scaling factor value to convert results from compute type data range to storage data range in the auxiliary matrix that is set via |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_SCALE_POINTER |
EPILOGUE_AUX_AMAX_POINTER |
Device pointer to the memory location that on completion will be set to the maximum of absolute values in the buffer that is set via |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_AMAX_POINTER |
EPILOGUE_AUX_SCALE_MODE |
Scaling mode that defines how the matrix scaling factor for the auxiliary matrix is interpreted. Default value: |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_EPILOGUE_AUX_SCALE_MODE |
|
A_SCALE_POINTER |
Device pointer to the scale factor value that converts data in matrix A to the compute data type range. The scaling factor must have the same type as the compute type. Matrix scaling is supported only when compute type is |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_POINTER |
B_SCALE_POINTER |
Equivalent to |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_B_SCALE_POINTER |
C_SCALE_POINTER |
Equivalent to |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_C_SCALE_POINTER |
D_SCALE_POINTER |
Equivalent to |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_D_SCALE_POINTER |
A_SCALE_MODE |
Scaling mode that defines how the matrix scaling factor for matrix A is interpreted. Default value: |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_A_SCALE_MODE |
|
B_SCALE_MODE |
Scaling mode that defines how the matrix scaling factor for matrix B is interpreted. Default value: |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_B_SCALE_MODE |
|
C_SCALE_MODE |
Scaling mode that defines how the matrix scaling factor for matrix C is interpreted. Default value: |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_C_SCALE_MODE |
|
D_SCALE_MODE |
Scaling mode that defines how the matrix scaling factor for matrix D is interpreted. Default value: |
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_D_SCALE_MODE |
|
AMAX_D_POINTER |
Device pointer to the memory location that on completion will be set to the maximum of absolute values in the output matrix. The computed value has the same type as the compute type. If not specified, the maximum absolute value is not computed. |
|
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_AMAX_D_POINTER |
cublasMpMatmulAlgoType_t
#
Matrix-matrix multiplication algorithms types to be used. This is treated as a hint and it is not guaranteed that cuBLASMp will use the requested implementation.
Value |
Meaning |
---|---|
CUBLASMP_MATMUL_ALGO_TYPE_DEFAULT |
Default algorithm. |
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_P2P |
Use split matmul with p2p communication. |
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_MULTICAST |
Use split matmul with multicast communication. Only valid for GEMM+ReduceScatter and GEMM+AllReduce on Compute Capability 9.0 (Hopper) and newer GPUs connected with an NVSwitch. |
CUBLASMP_MATMUL_ALGO_TYPE_ATOMIC_MULTICAST |
Use atomic matmul with multicast communication. Only valid for GEMM+ReduceScatter on Compute Capability 9.0 (Hopper) and newer GPUs connected with an NVSwitch. [DEPRECATED] |
cublasMpMatmulMatrixScale_t
#
An enumerated type used to specify the scaling mode that defines how scaling factor pointers are interpreted.
Value |
Meaning |
---|---|
CUBLASMP_MATMUL_MATRIX_SCALE_SCALAR_FP32 |
Scaling factors are single-precision scalars applied to the whole tensors. |
cublasMpMatmulEpilogue_t
#
An enumerated type used to specify the epilogue operation to be performed after matrix multiplication. Epilogues enable fusion of additional operations such as activation functions, bias addition, and their derivatives for training, providing better performance by avoiding separate kernel launches.
Value ( |
Meaning |
Supported Matmul algorithms |
Value (complete name) |
---|---|---|---|
DEFAULT |
No special postprocessing. |
all |
CUBLASMP_MATMUL_EPILOGUE_DEFAULT |
ALLREDUCE |
Performs AllReduce communication operation after matrix multiplication. |
all |
CUBLASMP_MATMUL_EPILOGUE_ALLREDUCE |
RELU |
Apply ReLU point-wise transform to the results ( |
all |
CUBLASMP_MATMUL_EPILOGUE_RELU |
RELU_AUX |
Apply ReLU point-wise transform to the results ( |
all |
CUBLASMP_MATMUL_EPILOGUE_RELU_AUX |
BIAS |
Apply (broadcast) bias from the bias vector. Bias vector length must match matrix D rows, and it must be packed (such as stride between vector elements is 1). Bias vector is broadcast to all columns and added before applying the final postprocessing. |
AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_BIAS |
RELU_BIAS |
Apply bias and then ReLU transform. |
AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_RELU_BIAS |
RELU_AUX_BIAS |
Apply bias and then ReLU transform. This epilogue mode produces an extra output, see |
AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_RELU_AUX_BIAS |
GELU |
Apply GELU point-wise transform to the results ( |
all |
CUBLASMP_MATMUL_EPILOGUE_GELU |
GELU_AUX |
Apply GELU point-wise transform to the results ( |
all |
CUBLASMP_MATMUL_EPILOGUE_GELU_AUX |
GELU_BIAS |
Apply Bias and then GELU transform. |
AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_GELU_BIAS |
GELU_AUX_BIAS |
Apply Bias and then GELU transform. This epilogue mode outputs GELU input as a separate matrix (useful for training). See |
AllGather+GEMM, GEMM+ReduceScatter, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_GELU_AUX_BIAS |
DGELU |
Apply GELU gradient to matmul output. Store GELU gradient in the output matrix. This epilogue mode requires an extra input, see |
AllGather+GEMM, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_DGELU |
DGELU_BGRAD |
Apply independently GELU and Bias gradient to matmul output. Store GELU gradient in the output matrix, and Bias gradient in the bias buffer (see |
AllGather+GEMM, GEMM+AllReduce |
CUBLASMP_MATMUL_EPILOGUE_DGELU_BGRAD |
BGRADA |
Apply Bias gradient to the input matrix A. The bias size corresponds to the number of rows of the matrix D. The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer, see |
all |
CUBLASMP_MATMUL_EPILOGUE_BGRADA |
BGRADB |
Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D. The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer, see |
all |
CUBLASMP_MATMUL_EPILOGUE_BGRADB |
DRELU |
|
none |
CUBLASMP_MATMUL_EPILOGUE_DRELU |
DRELU_BGRAD |
|
none |
CUBLASMP_MATMUL_EPILOGUE_DRELU_BGRAD |