BinaryContraction#

class nvmath.tensor.BinaryContraction(
expr,
a,
b,
*,
c=None,
out=None,
qualifiers=None,
stream=None,
options=None,
execution=None,
)[source]#

Create a stateful object encapsulating the specified binary tensor contraction \(\alpha a @ b + \beta c\) and the required resources to perform the operation. A stateful object can be used to amortize the cost of preparation (planning in the case of binary tensor contraction) across multiple executions (also see the Stateful APIs section).

The function-form API binary_contraction() is a convenient alternative to using stateful objects for single use (the user needs to perform just one tensor contraction, for example), in which case there is no possibility of amortizing preparatory costs. The function-form APIs are just convenience wrappers around the stateful object APIs.

Using the stateful object typically involves the following steps:

  1. Problem Specification: Initialize the object with a defined operation and options.

  2. Preparation: Use plan() to determine the best algorithmic implementation for this specific binary tensor contraction operation.

  3. Execution: Perform the tensor contraction computation with execute().

  4. Resource Management: Ensure all resources are released either by explicitly calling free() or by managing the stateful object within a context manager.

Detailed information on what’s happening in the various phases described above can be obtained by passing in a logging.Logger object to ContractionOptions or by setting the appropriate options in the root logger object, which is used by default:

>>> import logging
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s %(levelname)-8s %(message)s",
...     datefmt="%m-%d %H:%M:%S",
... )

A user can select the desired logging level and, in general, take advantage of all of the functionality offered by the Python logging module.

Parameters:
  • a – A tensor representing the first operand to the tensor contraction. The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • b – A tensor representing the second operand to the tensor contraction. The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • c – (Optional) A tensor representing the operand to add to the tensor contraction result (fused operation in cuTensor). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • out – (Optional) The output tensor to store the result of the contraction. Must be a numpy.ndarray, cupy.ndarray, or torch.Tensor object and must be on the same device as the input operands. If not specified, the result will be returned on the same device as the input operands.

  • note:: (..) – The support of output tensor in the API is experimental and subject to change in future versions without prior notice.

  • qualifiers – If desired, specify the operators as a numpy.ndarray of dtype tensor_qualifiers_dtype with the same length as the number of operands in the contraction expression plus one (for the operand to be added). All elements must be valid Operator objects. See Matrix and Tensor Qualifiers for the motivation behind qualifiers.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

  • options – Specify options for the tensor contraction as a ContractionOptions object. Alternatively, a dict containing the parameters for the ContractionOptions constructor can also be provided. If not specified, the value will be set to the default-constructed ContractionOptions object.

  • execution – Specify execution space options for the tensor contraction as a ExecutionCUDA object or a string ‘cuda’. Alternatively, a dict containing ‘name’ key set to ‘cuda’ and the additional parameters for the ExecutionCUDA constructor can also be provided. If not provided, the execution space will be selected to match operand’s storage if the operands are on the GPU. If the operands are on the CPU and execution space is not provided, the execution space will be a default-constructed ExecutionCUDA object with device_id = 0.

Examples

>>> import numpy as np
>>> import nvmath

Create two 3-D float64 ndarrays on the CPU:

>>> M, N, K = 32, 32, 32
>>> a = np.random.rand(M, N, K)
>>> b = np.random.rand(N, K, M)

We will define a binary tensor contraction operation.

Create a BinaryContraction object encapsulating the problem specification above:

>>> contraction = nvmath.tensor.BinaryContraction("ijk,jkl->il", a, b)

Options can be provided above to control the behavior of the operation using the options argument (see ContractionOptions).

Next, plan the operation. Optionally, preferences can be specified for planning:

>>> contraction.plan()

Now execute the binary tensor contraction, and obtain the result r1 as a NumPy ndarray.

>>> r1 = contraction.execute()

Finally, free the object’s resources. To avoid having to explicitly making this call, it’s recommended to use the BinaryContraction object as a context manager as shown below, if possible.

>>> contraction.free()

Note that all BinaryContraction methods execute on the current stream by default. Alternatively, the stream argument can be used to run a method on a specified stream.

Let’s now look at the same problem with CuPy ndarrays on the GPU.

Create a 3-D float64 CuPy ndarray on the GPU:

>>> import cupy as cp
>>> a = cp.random.rand(M, N, K)
>>> b = cp.random.rand(N, K, M)

Create an BinaryContraction object encapsulating the problem specification described earlier and use it as a context manager.

>>> with nvmath.tensor.BinaryContraction("ijk,jkl->il", a, b) as contraction:
...     contraction.plan()
...
...     # Execute the operation to get the first result.
...     r1 = contraction.execute()
...
...     # Update operands A and B in-place (see reset_operands() for an
...     # alternative).
...     a[:] = cp.random.rand(M, K)
...     b[:] = cp.random.rand(K, N)
...
...     # Execute the operation to get the new result.
...     r2 = contraction.execute()

All the resources used by the object are released at the end of the block.

Further examples can be found in the nvmath/examples/tensor/contraction directory.

Methods

__init__(
expr,
a,
b,
*,
c=None,
out=None,
qualifiers=None,
stream=None,
options=None,
execution=None,
)[source]#

Binary & Ternary Contraction

execute(
*,
alpha=1.0,
beta=None,
release_workspace=False,
stream=None,
)[source]#

Execute a prepared tensor contraction.

Parameters:
  • alpha – The scale factor for the tensor contraction term as a real or complex number. The default is \(1.0\).

  • beta – The scale factor for the tensor addition term as a real or complex number. A value for beta must be provided if the operand to be added is specified.

  • release_workspace – A value of True specifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value of False specifies that the object should retain the memory. This option may be set to True if the application performs other operations that consume a lot of memory between successive calls to the (same or different) execute() API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default is False.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Returns:

The result of the specified contraction, which remains on the same device and belong to the same package as the input operands.

free()[source]#

Free tensor contraction resources.

It is recommended that the contraction object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the tensor contraction resources (especially internal library objects) are properly cleaned up.

plan(*, stream=None)[source]#

Plan the tensor contraction. The planning phase can be optionally configured through the property plan_preference (an object of type ContractionPlanPreference).

Parameters:

stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Note

If the plan_preference has been updated, a plan() call is required to apply the changes.

reset_operands(
a=None,
b=None,
*,
c=None,
out=None,
stream=None,
)[source]#

Reset the operands held by this BinaryContraction instance.

This method has two use cases:
  1. it can be used to provide new operands for execution when the original operands are on the CPU

  2. it can be used to release the internal reference to the previous operands and make their memory available for other use by passing None for all arguments. In this case, this method must be called again to provide the desired operands before another call to execution APIs like execute().

This method is not needed when the operands reside on the GPU and in-place operations are used to update the operand values.

This method will perform various checks on the new operands to make sure:

  • The shapes, strides, datatypes match those of the old ones.

  • The packages that the operands belong to match those of the old ones.

  • If input tensors are on GPU, the device must match.

Parameters:
  • a – A tensor representing the first operand to the tensor contraction. The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • b – A tensor representing the second operand to the tensor contraction. The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • c – (Optional) A tensor representing the operand to add to the tensor contraction result (fused operation in cuTensor). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • out – (Optional) The output tensor to store the result of the contraction. Must be a numpy.ndarray, cupy.ndarray, or torch.Tensor object and must be on the same device as the input operands. If not specified, the result will be returned on the same device as the input operands.

Note

The support of output tensor in the API is experimental and subject to change in future versions without prior notice.

stream: Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Examples

>>> import cupy as cp
>>> import nvmath

Create two 3-D float64 ndarrays on the GPU:

>>> M, N, K = 128, 128, 256
>>> a = cp.random.rand(M, K)
>>> b = cp.random.rand(K, N)

Create an binary contraction object as a context manager

>>> with nvmath.tensor.BinaryContraction("ij,jk->ik", a, b) as contraction:
...     # Plan the operation.
...     algorithms = contraction.plan()
...
...     # Execute the contraction to get the first result.
...     r1 = contraction.execute()
...
...     # Reset the operands to new CuPy ndarrays.
...     a1 = cp.random.rand(M, K)
...     b1 = cp.random.rand(K, N)
...     contraction.reset_operands(a=a1, b=b1)
...
...     # Execute to get the new result corresponding to the updated operands.
...     r2 = contraction.execute()

Note that if only a subset of operands are reset, the operands that are not reset hold their original values.

With reset_operands(), minimal overhead is achieved as problem specification and planning are only performed once.

For the particular example above, explicitly calling reset_operands() is equivalent to updating the operands in-place, i.e, replacing contraction.reset_operand(a=a1, b=b1) with a[:]=a1 and b[:]=b1. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result under the additional constraint below:

  • The operand is on the GPU (more precisely, the operand memory space should be accessible from the execution space).

For more details, please refer to inplace update example.

Attributes

plan_preference#

An accessor to configure or query the contraction planning phase attributes.

Returns:

A ContractionPlanPreference object, whose attributes can be set (or queried) to configure the planning phase.