matmul#
- 
nvmath.linalg. advanced. matmul( 
- a,
- b,
- /,
- c=None,
- *,
- alpha=None,
- beta=None,
- epilog=None,
- epilog_inputs=None,
- qualifiers=None,
- options=None,
- preferences=None,
- algorithm=None,
- stream=None,
- Perform the specified matrix multiplication computation \(F(\alpha a @ b + \beta c)\), where \(F\) is the epilog. This function-form is a wrapper around the stateful - Matmulobject APIs and is meant for single use (the user needs to perform just one matrix multiplication, for example), in which case there is no possibility of amortizing preparatory costs.- Detailed information on what’s happening within this function can be obtained by passing in a - logging.Loggerobject to- MatmulOptionsor by setting the appropriate options in the root logger object, which is used by default:- >>> import logging >>> logging.basicConfig( ... level=logging.INFO, ... format="%(asctime)s %(levelname)-8s %(message)s", ... datefmt="%m-%d %H:%M:%S", ... ) - A user can select the desired logging level and, in general, take advantage of all of the functionality offered by the Python - loggingmodule.- Parameters:
- a – A tensor representing the first operand to the matrix multiplication (see - Semantics). The currently supported types are- numpy.ndarray,- cupy.ndarray, and- torch.Tensor.
- b – A tensor representing the second operand to the matrix multiplication (see - Semantics). The currently supported types are- numpy.ndarray,- cupy.ndarray, and- torch.Tensor.
- c – - (Optional) A tensor representing the operand to add to the matrix multiplication result (see - Semantics). The currently supported types are- numpy.ndarray,- cupy.ndarray, and- torch.Tensor.- Note - The broadcasting behavior of a 1-D (vector) - cdeviates from the equivalent NumPy expression. With nvmath-python,- cis internally promoted to shape (M, 1) in order to broadcast with- a @ b; this matches the behavior of cuBLASLt. With NumPy, a 1-D- cbehaves as if it has shape (1, N) in the expression- a @ b + c.- Deprecated since version 0.2.1: In order to avoid broadcasting behavior ambiguity, nvmath-python will no longer accept a 1-D (vector) - cstarting in version 0.3.0. Use a singleton dimension to convert your input array to 2-D.
- alpha – The scale factor for the matrix multiplication term as a real or complex number. The default is \(1.0\). 
- beta – The scale factor for the matrix addition term as a real or complex number. A value for - betamust be provided if operand- cis specified. from a previously planned and autotuned matrix multiplication.
- epilog – Specify an epilog \(F\) as an object of type - MatmulEpilogto apply to the result of the matrix multiplication: \(F(\alpha A @ B + \beta C\)). The default is no epilog. See cuBLASLt documentation for the list of available epilogs.
- epilog_inputs – Specify the additional inputs needed for the selected epilog as a dictionary, where the key is the epilog input name and the value is the epilog input. The epilog input must be a tensor with the same package and in the same memory space as the operands (see the constructor for more information on the operands). If the required epilog inputs are not provided, an exception is raised that lists the required epilog inputs. Some epilog inputs are generated by other epilogs. For example, the epilog input for - MatmulEpilog.DRELUis generated by matrix multiplication with the same operands using- MatmulEpilog.RELU_AUX.
- qualifiers – If desired, specify the matrix qualifiers as a - numpy.ndarrayof- matrix_qualifiers_dtypeobjects of length 3 corresponding to the operands- a,- b, and- c.
- options – Specify options for the matrix multiplication as a - MatmulOptionsobject. Alternatively, a- dictcontaining the parameters for the- MatmulOptionsconstructor can also be provided. If not specified, the value will be set to the default-constructed- MatmulOptionsobject.
- preferences – This parameter specifies the preferences for planning as a - MatmulPlanPreferencesobject. Alternatively, a dictionary containing the parameters for the- MatmulPlanPreferencesconstructor can also be provided. If not specified, the value will be set to the default-constructed- MatmulPlanPreferencesobject.
- algorithm – An object of type - Algorithmobjects can be directly provided to bypass planning, if desired. The algorithm object must be compatible with the matrix multiplication. A typical use for this option is to provide an algorithm that has been serialized (pickled) from a previously planned and autotuned matrix multiplication.
- stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include - cudaStream_t(as Python- int),- cupy.cuda.Stream, and- torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
 
- Returns:
- The result of the specified matrix multiplication (epilog applied), which remains on the same device and belong to the same package as the input operands. If an epilog (like - nvmath.) that results in extra output is used, a tuple is returned with the first element being the matrix multiplication result (epilog applied) and the second element being the auxiliary output provided by the selected epilog as a- linalg. - advanced. - MatmulEpilog. - RELU_AUX - dict.
 - Semantics:
- The semantics of the matrix multiplication follows - numpy.matmul()semantics, with some restrictions on broadcasting. In addition, the semantics for the fused matrix addition are described below:- If arguments - aand- bare matrices, they are multiplied according to the rules of matrix multiplication.
- If argument - ais 1-D, it is promoted to a matrix by prefixing- 1to its dimensions. After matrix multiplication, the prefixed- 1is removed from the result’s dimensions.
- If argument - bis 1-D, it is promoted to a matrix by appending- 1to its dimensions. After matrix multiplication, the appended- 1is removed from the result’s dimensions.
- If - aor- bis N-D (N > 2), then the operand is treated as a batch of matrices. If both- aand- bare N-D, their batch dimensions must match. If exactly one of- aor- bis N-D, the other operand is broadcast.
- The operand for the matrix addition - cmay be a vector of length M, a matrix of shape (M, 1) or (M, N), or batched versions of the latter (…, M, 1) or (…, M, N). Here M and N are the dimensions of the result of the matrix multiplication. If a vector is provided or N = 1, the columns of- care broadcast for the addition. If batch dimensions are not present,- cis broadcast across batches as needed.
- Similarly, when operating on a batch, auxiliary outputs are 3-D for all epilogs. Therefore, epilogs that return 1-D vectors of length N in non-batched mode return 3-D matrices of size (batch, N, 1) in batched mode. 
 
 - See also - Examples - >>> import cupy as cp >>> import nvmath - Create three float32 ndarrays on the GPU: - >>> M, N, K = 128, 64, 256 >>> a = cp.random.rand(M, K, dtype=cp.float32) >>> b = cp.random.rand(K, N, dtype=cp.float32) >>> c = cp.random.rand(M, N, dtype=cp.float32) - Perform the operation \(\alpha A @ B + \beta C\) using - matmul(). The result- ris also a CuPy float64 ndarray:- >>> r = nvmath.linalg.advanced.matmul(a, b, c, alpha=1.23, beta=0.74) - An epilog can be used as well. Here we perform \(RELU(\alpha A @ B + \beta C)\): - >>> epilog = nvmath.linalg.advanced.MatmulEpilog.RELU >>> r = nvmath.linalg.advanced.matmul(a, b, c, alpha=1.23, beta=0.74, epilog=epilog) - Options can be provided to customize the operation: - >>> compute_type = nvmath.linalg.advanced.MatmulComputeType.COMPUTE_32F_FAST_TF32 >>> o = nvmath.linalg.advanced.MatmulOptions(compute_type=compute_type) >>> r = nvmath.linalg.advanced.matmul(a, b, options=o) - See - MatmulOptionsfor the complete list of available options.- The package current stream is used by default, but a stream can be explicitly provided to the Matmul operation. This can be done if the operands are computed on a different stream, for example: - >>> s = cp.cuda.Stream() >>> with s: ... a = cp.random.rand(M, K) ... b = cp.random.rand(K, N) >>> r = nvmath.linalg.advanced.matmul(a, b, stream=s) - The operation above runs on stream - sand is ordered with respect to the input computation.- Create NumPy ndarrays on the CPU. - >>> import numpy as np >>> a = np.random.rand(M, K) >>> b = np.random.rand(K, N) - Provide the NumPy ndarrays to - matmul(), with the result also being a NumPy ndarray:- >>> r = nvmath.linalg.advanced.matmul(a, b) - Notes - This function is a convenience wrapper around - Matmuland and is specifically meant for single use.
 - Further examples can be found in the nvmath/examples/linalg/advanced/matmul directory.