matmul#

nvmath.linalg.matmul(
a: AnyTensor,
b: AnyTensor,
/,
c: AnyTensor | None = None,
*,
alpha: float | complex | None = None,
beta: float | complex | None = None,
qualifiers: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None,
options: MatmulOptions | None = None,
execution: ExecutionCPU | ExecutionCUDA | None = None,
stream: AnyStream | int | None = None,
)[source]#

Perform the specified matrix multiplication computation \(\alpha a @ b + \beta c\). This function-form is a wrapper around the stateful Matmul object APIs and is meant for single use (the user needs to perform just one matrix multiplication, for example), in which case there is no possibility of amortizing preparatory costs.

Detailed information on what’s happening within this function can be obtained by passing in a logging.Logger object to MatmulOptions or by setting the appropriate options in the root logger object, which is used by default:

>>> import logging
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s %(levelname)-8s %(message)s",
...     datefmt="%m-%d %H:%M:%S",
... )

A user can select the desired logging level and, in general, take advantage of all of the functionality offered by the Python logging module.

Parameters:
  • a – A tensor representing the first operand to the matrix multiplication (see Semantics). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • b – A tensor representing the second operand to the matrix multiplication (see Semantics). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • c – (Optional) A tensor representing the operand to add to the matrix multiplication result (see Semantics). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • alpha – The scale factor for the matrix multiplication term as a real or complex number. The default is \(1.0\).

  • beta – The scale factor for the matrix addition term as a real or complex number. A value for beta must be provided if operand c is specified. from a previously planned and autotuned matrix multiplication.

  • qualifiers – If desired, specify the matrix qualifiers as a numpy.ndarray of matrix_qualifiers_dtype objects of length <= 3 corresponding to the operands a, b, and c. By default, GeneralMatrixQualifier is assumed for each tensor. See Matrix and Tensor Qualifiers for the motivation behind qualifiers.

  • options – Specify options for the matrix multiplication as a MatmulOptions object. If not specified, the value will be set to the default-constructed MatmulOptions object.

  • execution – Specify execution space options for the Matmul as a ExecutionCUDA or ExecutionCPU object. If not specified, the execution space will be selected to match operand’s storage (in GPU or host memory), and the corresponding ExecutionCUDA or ExecutionCPU object will be default-constructed.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Returns:

The result of the specified matrix multiplication, which remains on the same device and belong to the same package as the input operands.

Semantics:

The semantics of the matrix multiplication follows numpy.matmul() semantics, with some restrictions.

  • Batching is not supported in this API, but is planned for a future release. See the advanced API (nvmath.linalg.advanced.matmul()) for an API that supports batching.

  • Broadcasting c is not supported in this API, but may be supported in the future. See the advanced API (nvmath.linalg.advanced.matmul()) for an API that supports broadcasting c.

In addition, the semantics for the fused matrix addition are described below:

  • If arguments a and b are matrices, they are multiplied according to the rules of matrix multiplication.

  • If argument a is 1-D, it is promoted to a matrix by prefixing 1 to its dimensions. After matrix multiplication, the prefixed 1 is removed from the result’s dimensions.

  • If argument b is 1-D, it is promoted to a matrix by appending 1 to its dimensions. After matrix multiplication, the appended 1 is removed from the result’s dimensions.

  • The operand for the matrix addition c must be the expected shape of the result of the matrix multiplication.

See also

Matmul, MatmulOptions, matrix_qualifiers_dtype, MatrixQualifier

Examples

>>> import cupy as cp
>>> import nvmath

Create three float32 ndarrays on the GPU:

>>> M, N, K = 128, 64, 256
>>> a = cp.random.rand(M, K, dtype=cp.float32)
>>> b = cp.random.rand(K, N, dtype=cp.float32)
>>> c = cp.random.rand(M, N, dtype=cp.float32)

Perform the operation \(\alpha A @ B + \beta C\) using matmul(). The result r is also a CuPy float64 ndarray:

>>> r = nvmath.linalg.matmul(a, b, c, alpha=1.23, beta=0.74)

The package current stream is used by default, but a stream can be explicitly provided to the Matmul operation. This can be done if the operands are computed on a different stream, for example:

>>> s = cp.cuda.Stream()
>>> with s:
...     a = cp.random.rand(M, K)
...     b = cp.random.rand(K, N)
>>> r = nvmath.linalg.matmul(a, b, stream=s)

The operation above runs on stream s and is ordered with respect to the input computation.

Create NumPy ndarrays on the CPU.

>>> import numpy as np
>>> a = np.random.rand(M, K)
>>> b = np.random.rand(K, N)

Provide the NumPy ndarrays to matmul(), with the result also being a NumPy ndarray:

>>> r = nvmath.linalg.matmul(a, b)

Notes

  • This function is a convenience wrapper around Matmul and and is specifically meant for single use.

Further examples can be found in the nvmath/examples/linalg/generic/matmul directory.