How to Use cuBLASMp# This section explains how to use cuBLASMp in your application. cuBLASMp Initialization Overview Contents Using cuBLASMp for Tensor Parallelism in Distributed Machine Learning AllGather + Matmul and Matmul + ReduceScatter in Terms of Traditional PBLAS On Python and cuBLASMp Data Ordering AllGather + Matmul Matmul + ReduceScatter Matmul + AllReduce General Assumptions and Limitations cuBLASMp Logging CUBLASMP_LOG_LEVEL CUBLASMP_LOG_MASK CUBLASMP_LOG_FILE cuBLASMp Data Types Data Types Enumerators cuBLASMp C API Library Management Grid Management Memory Management Matrix Management Matmul Properties Utility Logging Dense Linear Algebra APIs