Computing gradients via backward propagation#
The following code example illustrates how to compute the gradients of a tensor network w.r.t. user-selected input tensors via backward propagation. The full code can be found in the NVIDIA/cuQuantum repository (here).
This example uses the network-centric API introduced in cuTensorNet v2.9.0. Key functions include cutensornetCreateNetwork, cutensornetNetworkAppendTensor, cutensornetNetworkSetOutputTensor, cutensornetNetworkPrepareContraction, cutensornetNetworkContract, cutensornetNetworkSetAdjointTensorMemory, cutensornetNetworkSetGradientTensorMemory, cutensornetNetworkPrepareGradientsBackward, and cutensornetNetworkComputeGradientsBackward.
Headers and data types#
Define tensor network and tensor sizes#
Next, we define the structure of the tensor network (i.e., the modes of the tensors, their extents, and their connectivity), and specify the input tensor IDs whose gradients will be computed.
See also the network definition APIs cutensornetCreateNetwork, cutensornetNetworkAppendTensor, and cutensornetNetworkSetOutputTensor.
Allocate memory, initialize data, initialize cuTensorNet handle#
Next, we allocate memory for the tensor network operands and initialize them to random values.
We also allocate memory for the gradient tensors corresponding to the selected input tensors for gradient computation,
as well as the activation tensor which we initialize to ones.
Then, we initialize the cuTensorNet library via cutensornetCreate().
Note that the created library context will be associated with the currently active GPU.
Construct the network#
We create the network descriptor, and append the input tensors with the desired tensor modes and extents, as well as the data type. We can, optional, set the output tensor modes (if skipped, the output modes will be inferred). To compute gradients with respect to specific input tensors, those tensors must be tagged (e.g., specified) using tensor qualifiers.
Contraction order#
In this example, we illustrate using a predetermined contraction path and
setting it into the optimizer info object via cutensornetContractionOptimizerInfoSetAttribute().
We also attach the constructed optimizer info object to the network via cutensornetNetworkSetOptimizerInfo()
Create workspace descriptor and allocate workspace memory#
Next, we create a workspace descriptor, compute the workspace sizes, and query the minimum workspace size needed
to contract the network.
To enable gradient computation, we need to provide CACHE workspace that will be used to store intermediate tensors’ data
necessary for the backward propagation call to consume.
Thus, we query sizes and allocate device memory for both kinds of workspaces
(CUTENSORNET_WORKSPACE_SCRATCH, and CUTENSORNET_WORKSPACE_CACHE) and set these in the workspace descriptor.
The workspace descriptor will be provided to the contraction preparation, contraction computation, and gradient computation APIs.
See also cutensornetWorkspaceSetMemory, cutensornetWorkspaceGetMemorySize, and cutensornetWorkspacePurgeCache.
Contraction preparation and auto-tuning#
We prepare the tensor network contraction, via cutensornetNetworkPrepareContraction().
Optionally, we can auto-tune the contraction, via cutensornetNetworkAutotuneContraction(),
such that cuTENSOR selects the best kernel for each pairwise contraction.
This prepared network contraction can be reused for many (possibly different) data inputs, avoiding
the cost of re-initializing it redundantly.
In the network-centric flow, see also cutensornetNetworkAutotuneContraction.
Input and output buffers are attached using cutensornetNetworkSetInputTensorMemory, and cutensornetNetworkSetOutputTensorMemory.
Tensor network contraction execution and gradient computation#
Finally, we contract the tensor network via cutensornetNetworkContract().
After contracting the network (which will store intermediate tensors’ data in the CACHE memory),
we prepare the gradient computation via cutensornetNetworkPrepareGradientsBackward()
and compute the required gradients through backward propagation via cutensornetNetworkComputeGradientsBackward().
We must purge the CACHE memory for each data set to allow the network contraction call to store the correct intermediate data in the CACHE memory.
Gradient and adjoint/activation buffers are attached using cutensornetNetworkSetGradientTensorMemory, and cutensornetNetworkSetAdjointTensorMemory.
See also cutensornetNetworkPrepareGradientsBackward, cutensornetNetworkComputeGradientsBackward, and cutensornetWorkspacePurgeCache.
Free resources#
After the computation, we need to free up all resources.