How to Use cuBLASMp# This section explains how to use cuBLASMp in your application. cuBLASMp Initialization Overview Contents Using cuBLASMp for Tensor Parallelism in Distributed Machine Learning AllGather+GEMM and GEMM+ReduceScatter in terms of traditional PBLAS On Python and cuBLASMp data ordering AllGather+GEMM GEMM+ReduceScatter GEMM+AllReduce General assumptions and limitations cuBLASMp Logging CUBLASMP_LOG_LEVEL CUBLASMP_LOG_MASK CUBLASMP_LOG_FILE cuBLASMp Data Types Data types Enumerators cuBLASMp C API Library Management Grid Management Matrix Management Matmul Properties Utility Logging Dense Linear Algebra APIs