matmul#
-
nvmath.
sparse. matmul( - a,
- b,
- /,
- c=None,
- *,
- alpha=None,
- beta=None,
- qualifiers=None,
- prologs=None,
- epilog=None,
- semiring=None,
- compute_capability=None,
- options=None,
- execution: ExecutionCUDA | None = None,
- stream: AnyStream | int | None = None,
Perform the specified sparse matrix multiplication computation, which is one of \(epilog(\alpha \, op_h(a) \, @ \, op_h(b) + \beta \, c)\) or \(epilog(prolog_a(op_t(a)) \, @ \, prolog_b(op_t(b)) + prolog_c(c))\). The \(op_h\) and \(op_t\) operators optionally specify transpose/hermitian or transpose operations respectively via the
qualifiersargument. In addition, the scalar multiplication and addition operators (“semiring”) can be customized by the user, if desired.Note
The complex conjugate operation is mutually exclusive with prolog since it can be absorbed into the prolog.
Note
Currently only in-place sparse matrix multiplication is supported, so operand
cmust be provided. This restriction will be removed in a future release.This function-form is a wrapper around the stateful
Matmulobject APIs and is meant for single use (the user needs to perform just one sparse matrix multiplication, for example), in which case there is no possibility of amortizing preparatory costs.Detailed information on what’s happening within this function can be obtained by passing in a
logging.Loggerobject toMatmulOptionsor by setting the appropriate options in the root logger object, which is used by default:>>> import logging >>> logging.basicConfig( ... level=logging.INFO, ... format="%(asctime)s %(levelname)-8s %(message)s", ... datefmt="%m-%d %H:%M:%S", ... )
A user can select the desired logging level and, in general, take advantage of all of the functionality offered by the Python
loggingmodule.- Parameters:
a – A sparse tensor representing the first operand
ain the sparse matrix multiplication (SpMM) from one of the supported sparse packages: SciPy, CuPy, PyTorch, or auniversal sparse tensor (UST)object (see semantics). The sparse representation may be in any of the formats supported by the sparse package (CSR, BSC, COO, …), including novel formats defined using the UST DSL.b – A dense tensor representing the second operand
bin the SpMM (see semantics). The currently supported types arenumpy.ndarray,cupy.ndarray,torch.Tensor, andnvmath..sparse. ust. Tensor c – A dense tensor representing the addend
cin the SpMM (see semantics). The currently supported types arenumpy.ndarray,cupy.ndarray,torch.Tensor, andnvmath..sparse. ust. Tensor alpha – The scale factor for the matrix multiplication term as a real or complex number. The default is \(1.0\).
beta – The scale factor for the addend term in the matrix multiplication as a real or complex number. The default is \(1.0\).
qualifiers – If desired, specify the matrix qualifiers as a
numpy.ndarrayofmatmul_matrix_qualifiers_dtypeobjects of length 3 corresponding to the operandsa,b, andc. See Matrix and Tensor Qualifiers for the motivation behind qualifiers.prologs – A dict mapping an operand label (
"a","b","c") to its prolog operation in LTO-IR format (as abytesobject). The prolog is a user-written unary function in Python that returns the transformed value, which has the data type of the operand to which it is applied. This function can be compiled to LTO-IR using the helpercompile_matmul_prolog()or your own compiler of choice. If not specified, no prolog will be applied to the operands.epilog – The epilog operation in LTO-IR format (as a
bytesobject). The epilog is a user-written unary function in Python that returns the transformed value, which has the data type of the SpMM result. This function can be compiled to LTO-IR using the helpercompile_matmul_epilog()or your own compiler of choice. If not specified, no epilog will be applied to the SpMM result.semiring – A dict mapping the semiring operations (
"mul","add","atomic_add") to LTO-IR code (as abytesobject). Each semiring operation is a binary function in Python that returns a value. These function can be compiled to LTO-IR using the helperscompile_matmul_mul(),compile_matmul_add(), orcompile_matmul_atomic_add()or your own compiler of choice. If not specified, the standard definitions of these operations from elementary algebra will be used.compute_capability – The target compute capability, specified as a string (
'80','89', …). The default is the compute capability of the current device.options – Specify options for the sparse matrix multiplication as a
MatmulOptionsobject. Alternatively, adictcontaining the parameters for theMatmulOptionsconstructor can also be provided. If not specified, the value will be set to the default-constructedMatmulOptionsobject.execution – Specify execution space options for the SpMM as a
ExecutionCUDAobject (the only execution space currently supported). If not specified, aExecutionCUDAobject will be default-constructed.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t(as Pythonint),cupy.cuda.Stream, andtorch.cuda.Stream. If a stream is not provided, the current stream for the operand device will be queried from the dense operandb(andc) package.
- Returns:
The result of the sparse matrix multiplication (epilog applied). Currently only in-place SpMM is supported (the result of the computation is written into the addend
c).
- Semantics:
The semantics of the matrix multiplication follows
numpy.matmulsemantics, with some restrictions on broadcasting. In addition, the semantics for the fused matrix addition are described below.For in-place matrix multiplication (where the result is written into
c) the result has the same shape asc.The operand
amust be a sparse matrix or batched sparse matrix. Popular named formats like BSC, BSR, COO, CSR, … are supported in addition to custom formats defined using the UST DSL.The operands
bandcmust be “dense” matrices (that is, their layout is strided).If the operands
aandbare matrices, they are multiplied according to the rules of matrix multiplication.If argument
bis 1-D, it is promoted to a matrix by appending1to its dimensions. After matrix multiplication, the appended1is removed from the result’s dimensions if the operation is not in-place.If
aorbis N-D (N > 2), then the operand is treated as a batch of matrices. If bothaandbare N-D, their batch dimensions must match. If exactly one ofaorbis N-D, the other operand is broadcast.The operand for the matrix addition
cmust be a matrix of shape (M, N), or the batched equivalent (…, M, N). Here M and N are the dimensions of the result of the matrix multiplication. If batch dimensions are not present,cis broadcast across batches as needed. If the operation is in-place,ccannot be broadcast since it must be large enough to hold the result.
See also
Examples
>>> import torch >>> import nvmath
Prepare sample data.
>>> index_type, dtype = torch.int32, torch.float32 >>> device_id = 0 >>> shape = 2, 2
Create a torch COO tensor, and view it as UST.
>>> indices = torch.tensor([[0, 1], [0, 1]], dtype=index_type) >>> values = torch.tensor([2.0, 4.0], dtype=dtype) >>> a = torch.sparse_coo_tensor(indices, values, shape, device=device_id) >>> a = a.coalesce() >>> a = nvmath.sparse.ust.Tensor.from_package(a)
Dense ‘b’ and ‘c’, also viewed as UST objects.
>>> b = torch.ones(*shape, dtype=dtype, device=device_id) >>> b = nvmath.sparse.ust.Tensor.from_package(b) >>> c = torch.zeros(*shape, dtype=dtype, device=device_id) >>> c = nvmath.sparse.ust.Tensor.from_package(c)
Solve \(c := a @ b + c\).
>>> r = nvmath.sparse.matmul(a, b, c, beta=1.0)
The result can also be viewed as a torch tensor.
>>> r = nvmath.sparse.ust.Tensor.to_package(r)
Note
This function is a convenience wrapper around
Matmuland is specifically meant for single use.Further examples can be found in the nvmath/examples/sparse/generic/matmul directory.