TernaryContraction#
-
class nvmath.
tensor. TernaryContraction( - expr,
- a,
- b,
- c,
- *,
- d=None,
- out=None,
- qualifiers=None,
- stream=None,
- options=None,
- execution=None,
Create a stateful object encapsulating the specified ternary tensor contraction \(\alpha a @ b + \beta c\) and the required resources to perform the operation. A stateful object can be used to amortize the cost of preparation (planning in the case of ternary tensor contraction) across multiple executions (also see the Stateful APIs section).
The function-form API
ternary_contraction()is a convenient alternative to using stateful objects for single use (the user needs to perform just one tensor contraction, for example), in which case there is no possibility of amortizing preparatory costs. The function-form APIs are just convenience wrappers around the stateful object APIs.Using the stateful object typically involves the following steps:
Problem Specification: Initialize the object with a defined operation and options.
Preparation: Use
plan()to determine the best algorithmic implementation for this specific ternary tensor contraction operation.Execution: Perform the tensor contraction computation with
execute().Resource Management: Ensure all resources are released either by explicitly calling
free()or by managing the stateful object within a context manager.
Detailed information on what’s happening in the various phases described above can be obtained by passing in a
logging.Loggerobject toContractionOptionsor by setting the appropriate options in the root logger object, which is used by default:>>> import logging >>> logging.basicConfig( ... level=logging.INFO, ... format="%(asctime)s %(levelname)-8s %(message)s", ... datefmt="%m-%d %H:%M:%S", ... )
A user can select the desired logging level and, in general, take advantage of all of the functionality offered by the Python
loggingmodule.- Parameters:
a – A tensor representing the first operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.b – A tensor representing the second operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.c – A tensor representing the third operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.d – (Optional) A tensor representing the operand to add to the tensor contraction result (fused operation in cuTensor). The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.out – (Optional) The output tensor to store the result of the contraction. Must be a
numpy.ndarray,cupy.ndarray, ortorch.Tensorobject and must be on the same device as the input operands. If not specified, the result will be returned on the same device as the input operands.note:: (..) – The support of output tensor in the API is experimental and subject to change in future versions without prior notice.
qualifiers – If desired, specify the operators as a
numpy.ndarrayof dtypetensor_qualifiers_dtypewith the same length as the number of operands in the contraction expression plus one (for the operand to be added). All elements must be validOperatorobjects. See Matrix and Tensor Qualifiers for the motivation behind qualifiers.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t(as Pythonint),cupy.cuda.Stream, andtorch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used. See Stream Semantics for more details on stream handling.options – Specify options for the tensor contraction as a
ContractionOptionsobject. Alternatively, adictcontaining the parameters for theContractionOptionsconstructor can also be provided. If not specified, the value will be set to the default-constructedContractionOptionsobject.execution – Specify execution space options for the tensor contraction as a
ExecutionCUDAobject or a string ‘cuda’. Alternatively, adictcontaining ‘name’ key set to ‘cuda’ and the additional parameters for theExecutionCUDAconstructor can also be provided. If not provided, the execution space will be selected to match operand’s storage if the operands are on the GPU. If the operands are on the CPU and execution space is not provided, the execution space will be a default-constructedExecutionCUDAobject with device_id = 0.
See also
plan_preference,plan(),reset_operands(),release_operands(),execute()Examples
>>> import numpy as np >>> import nvmath
Create three 3-D float64 ndarrays on the CPU:
>>> M, N, K = 32, 32, 32 >>> a = np.random.rand(M, N, K) >>> b = np.random.rand(N, K, M) >>> c = np.random.rand(M, N)
We will define a ternary tensor contraction operation.
Create a TernaryContraction object encapsulating the problem specification above:
>>> expr = "ijk,jkl,ln->in" >>> contraction = nvmath.tensor.TernaryContraction(expr, a, b, c)
Options can be provided above to control the behavior of the operation using the
optionsargument (seeContractionOptions).Next, plan the operation. Optionally, preferences can be specified for planning:
>>> contraction.plan()
Now execute the ternary tensor contraction, and obtain the result
r1as a NumPy ndarray.>>> r1 = contraction.execute()
Finally, free the object’s resources. To avoid having to explicitly making this call, it’s recommended to use the TernaryContraction object as a context manager as shown below, if possible.
>>> contraction.free()
Note that all
TernaryContractionmethods execute on the current stream by default. Alternatively, thestreamargument can be used to run a method on a specified stream.Let’s now look at the same problem with CuPy ndarrays on the GPU.
Create a 3-D float64 CuPy ndarray on the GPU:
>>> import cupy as cp >>> a = cp.random.rand(M, N, K) >>> b = cp.random.rand(N, K, M) >>> c = cp.random.rand(M, N)
Create an TernaryContraction object encapsulating the problem specification described earlier and use it as a context manager.
>>> expr = "ijk,jkl,ln->in" >>> with nvmath.tensor.TernaryContraction(expr, a, b, c) as contraction: ... contraction.plan() ... ... # Execute the operation to get the first result. ... r1 = contraction.execute() ... ... # Update operands A, B and C in-place (see reset_operands() for an ... # alternative). ... a[:] = cp.random.rand(M, N, K) ... b[:] = cp.random.rand(N, K, M) ... c[:] = cp.random.rand(M, N) ... ... # Execute the operation to get the new result. ... r2 = contraction.execute()
All the resources used by the object are released at the end of the block.
Further examples can be found in the nvmath/examples/tensor/contraction directory.
Attributes
- plan_preference#
An accessor to configure or query the contraction planning phase attributes.
- Returns:
A
ContractionPlanPreferenceobject, whose attributes can be set (or queried) to configure the planning phase.
See also
Methods
- execute(
- *,
- alpha=1.0,
- beta=None,
- release_workspace=False,
- stream=None,
Execute a prepared tensor contraction.
- Parameters:
alpha – The scale factor for the tensor contraction term as a real or complex number. The default is \(1.0\).
beta – The scale factor for the tensor addition term as a real or complex number. A value for
betamust be provided if the operand to be added is specified.release_workspace – A value of
Truespecifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value ofFalsespecifies that the object should retain the memory. This option may be set toTrueif the application performs other operations that consume a lot of memory between successive calls to the (same or different)execute()API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default isFalse.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t(as Pythonint),cupy.cuda.Stream, andtorch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used. See Stream Semantics for more details on stream handling.
- Returns:
The result of the specified contraction, which remains on the same device and belong to the same package as the input operands.
- free()[source]#
Free tensor contraction resources.
It is recommended that the contraction object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the tensor contraction resources (especially internal library objects) are properly cleaned up.
- plan(*, stream=None)[source]#
Plan the tensor contraction. The planning phase can be optionally configured through the property
plan_preference(an object of typeContractionPlanPreference).- Parameters:
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t(as Pythonint),cupy.cuda.Stream, andtorch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used. See Stream Semantics for more details on stream handling.
See also
Note
If the
plan_preferencehas been updated, aplan()call is required to apply the changes.
- release_operands()[source]#
This method is experimental and potentially subject to future changes.
Added in version 0.9.0.
This method does two things:
Releases internal references to the user-provided operands, so that this instance no longer contributes to their reference counts.
Frees any internal copies (mirrors) that were created when the user-provided operands reside in a different memory space than the execution (i.e., copies made during construction or
reset_operands()/reset_operands_unchecked()if present).
This functionality can be useful in memory-constrained scenarios, e.g. where multiple stateful objects need to coexist. Leveraging this functionality, the caller can reduce memory usage while retaining the planned state.
- Parameters:
None
- Returns:
None
- Semantics:
Preserves the planned state of the stateful object.
After calling this method,
reset_operands()(orreset_operands_unchecked()if present) must be called to supply new operands before the nextexecute()call. Failure to do so will result in a runtime error. Device-side copies will be re-allocated as needed.For cross-space scenarios (e.g. CPU operands with GPU execution, or GPU operands with CPU execution): execution is guaranteed to be always blocking, so
execute()does not return until all computation is complete. It is therefore always safe to call this method after callingexecute()without additional synchronization.When the operands are in the same memory space as the execution (e.g. GPU operands with GPU execution): in such case, this method drops this instance’s internal reference to the user-provided operands. If the reference count of the operands reaches zero, their memory may be freed, so particular attention should be paid. The caller is responsible to ensure that if such deallocation happens, it is ordered after pending computation (e.g. by retaining a reference until the computation is complete, or by synchronizing the stream). Failure to do so is analogous to use-after-free.
See Overview, Stateful APIs: Design and Usage Patterns for operand lifecycle and usage patterns, and Stream Semantics for stream ordering rules.
- reset_operands(
- *,
- a=None,
- b=None,
- c=None,
- d=None,
- out=None,
- stream=None,
Reset one or more operands held by this
TernaryContractioninstance. Only the operands explicitly passed are updated; omitted operands retain their current values.Changed in version 0.9: All parameters are now keyword-only.
- Parameters:
a – A tensor representing the first operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.b – A tensor representing the second operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.c – A tensor representing the third operand to the tensor contraction. The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.d – (Optional) A tensor representing the operand to add to the tensor contraction result (fused operation in cuTensor). The currently supported types are
numpy.ndarray,cupy.ndarray, andtorch.Tensor.out – (Optional) The output tensor to store the result of the contraction. Must be a
numpy.ndarray,cupy.ndarray, ortorch.Tensorobject and must be on the same device as the input operands. If not specified, the result will be returned on the same device as the input operands.
Note
The support of output tensor in the API is experimental and subject to change in future versions without prior notice.
stream: Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t(as Pythonint),cupy.cuda.Stream, andtorch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used. See Stream Semantics for more details on stream handling.- Semantics:
This method validates each new operand against its corresponding one set during the object’s initialization. An operand is compatible if all of the following requirements are met:
The shapes, strides, and datatypes match those of the old one.
The package (e.g., cupy, torch, numpy) matches that of the old one.
The memory space (CPU or GPU) matches that of the old one.
The device matches that of the old one, if the operand is on GPU.
The pointer alignment must be the same or a multiple of the old one.
If the execution space matches the memory space of the operand: operand’s reference is updated with no data copying.
If the execution space does not match the memory space of the operand: data is copied between different memory spaces.
Examples
>>> import cupy as cp >>> import nvmath
Create two 3-D float64 ndarrays on the GPU:
>>> M, N, K = 12, 16, 32 >>> a = cp.random.rand(M, M, N) >>> b = cp.random.rand(N, K) >>> c = cp.random.rand(K, K)
Create an ternary contraction object as a context manager
>>> expr = "ijk,kl,lm->ijm" >>> with nvmath.tensor.TernaryContraction(expr, a, b, c) as contraction: ... # Plan the operation. ... algorithms = contraction.plan() ... ... # Execute the contraction to get the first result. ... r1 = contraction.execute() ... ... # Reset the operands to new CuPy ndarrays. ... a1 = cp.random.rand(M, M, N) ... b1 = cp.random.rand(N, K) ... c1 = cp.random.rand(K, K) ... contraction.reset_operands(a=a1, b=b1, c=c1) ... ... # Execute to get the new result corresponding to the updated operands. ... r2 = contraction.execute()
With
reset_operands(), minimal overhead is achieved as problem specification and planning are only performed once.For the particular example above, explicitly calling
reset_operands()is equivalent to updating the operands in-place, i.e, replacingcontraction.reset_operands(a=a1, b=b1, c=c1)witha[:]=a1andb[:]=b1andc[:]=c1. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result under the additional constraint below:The operand is on the GPU (more precisely, the operand memory space should be accessible from the execution space).
For more details, please refer to inplace update example.
See also
- reset_operands_unchecked( )[source]#
This method is experimental and potentially subject to future changes.
Added in version 0.9.0.
This method is a performance-optimized alternative to
reset_operands()that eliminates validation and logging overhead, making it ideal for performance-critical loops where operands compatibility is guaranteed by the caller.This method accepts the same parameters as
reset_operands().- Semantics:
The semantics are the same as in
reset_operands(), except that this method does not perform any validation (e.g. package match, data type match, pointer alignment match, etc.) or logging.- When to Use:
Performance-critical loops with repeated executions on different operands
After verifying correctness with
reset_operands()during developmentWhen operands compatibility is guaranteed by construction or invariant
See also