--------------------------------------------
Computing gradients via backward propagation
--------------------------------------------

The following code example illustrates how to compute the gradients of a tensor network w.r.t. user-selected input tensors via backward propagation.
The full code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository (`here <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example_gradients.cu>`_).

This example uses the network-centric API introduced in cuTensorNet v2.9.0. Key functions include
:ref:`cutensornetCreateNetwork <cutensornetCreateNetwork-label>`,
:ref:`cutensornetNetworkAppendTensor <cutensornetNetworkAppendTensor-label>`,
:ref:`cutensornetNetworkSetOutputTensor <cutensornetNetworkSetOutputTensor-label>`,
:ref:`cutensornetNetworkPrepareContraction <cutensornetNetworkPrepareContraction-label>`,
:ref:`cutensornetNetworkContract <cutensornetNetworkContract-label>`,
:ref:`cutensornetNetworkSetAdjointTensorMemory <cutensornetNetworkSetAdjointTensorMemory-label>`,
:ref:`cutensornetNetworkSetGradientTensorMemory <cutensornetNetworkSetGradientTensorMemory-label>`,
:ref:`cutensornetNetworkPrepareGradientsBackward <cutensornetNetworkPrepareGradientsBackward-label>`, and
:ref:`cutensornetNetworkComputeGradientsBackward <cutensornetNetworkComputeGradientsBackward-label>`.

----------------------
Headers and data types
----------------------

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #1
   :end-before: Sphinx: #2

--------------------------------------
Define tensor network and tensor sizes
--------------------------------------

Next, we define the structure of the tensor network (i.e., the modes of the tensors, their extents, and their connectivity),
and specify the input tensor IDs whose gradients will be computed.

See also the network definition APIs
:ref:`cutensornetCreateNetwork <cutensornetCreateNetwork-label>`,
:ref:`cutensornetNetworkAppendTensor <cutensornetNetworkAppendTensor-label>`, and
:ref:`cutensornetNetworkSetOutputTensor <cutensornetNetworkSetOutputTensor-label>`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #2
   :end-before: Sphinx: #3

---------------------------------------------------------------
Allocate memory, initialize data, initialize cuTensorNet handle
---------------------------------------------------------------

Next, we allocate memory for the tensor network operands and initialize them to random values.
We also allocate memory for the gradient tensors corresponding to the selected input tensors for gradient computation, 
as well as the activation tensor which we initialize to ones.
Then, we initialize the *cuTensorNet* library via `cutensornetCreate()`.
Note that the created library context will be associated with the currently active GPU.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #3
   :end-before: Sphinx: #4

---------------------
Construct the network
---------------------

We create the network descriptor, and append the input tensors with the desired tensor modes and extents, as well as the data type. 
We can, optional, set the output tensor modes (if skipped, the output modes will be inferred).
To compute gradients with respect to specific input tensors, those tensors must be tagged (e.g., specified) using tensor qualifiers.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #4
   :end-before: Sphinx: #5

-----------------
Contraction order
-----------------

In this example, we illustrate using a predetermined contraction path and
setting it into the optimizer info object via `cutensornetContractionOptimizerInfoSetAttribute()`. 
We also attach the constructed optimizer info object to the network via `cutensornetNetworkSetOptimizerInfo()`

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #5
   :end-before: Sphinx: #6

---------------------------------------------------------
Create workspace descriptor and allocate workspace memory
---------------------------------------------------------

Next, we create a workspace descriptor, compute the workspace sizes, and query the minimum workspace size needed
to contract the network. 
To enable gradient computation, we need to provide CACHE workspace that will be used to store intermediate tensors' data 
necessary for the backward propagation call to consume.
Thus, we query sizes and allocate device memory for both kinds of workspaces 
(`CUTENSORNET_WORKSPACE_SCRATCH`, and `CUTENSORNET_WORKSPACE_CACHE`) and set these in the workspace descriptor.
The workspace descriptor will be provided to the contraction preparation, contraction computation, and gradient computation APIs.

See also :ref:`cutensornetWorkspaceSetMemory <cutensornetWorkspaceSetMemory-label>`,
:ref:`cutensornetWorkspaceGetMemorySize <cutensornetWorkspaceGetMemorySize-label>`, and
:ref:`cutensornetWorkspacePurgeCache <cutensornetWorkspacePurgeCache-label>`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #6
   :end-before: Sphinx: #7

---------------------------------------
Contraction preparation and auto-tuning
---------------------------------------

We prepare the tensor network contraction, via `cutensornetNetworkPrepareContraction()`.
Optionally, we can auto-tune the contraction, via `cutensornetNetworkAutotuneContraction()`, 
such that cuTENSOR selects the best kernel for each pairwise contraction.
This prepared network contraction can be reused for many (possibly different) data inputs, avoiding
the cost of re-initializing it redundantly.

In the network-centric flow, see also :ref:`cutensornetNetworkAutotuneContraction <cutensornetNetworkAutotuneContraction-label>`.

Input and output buffers are attached using
:ref:`cutensornetNetworkSetInputTensorMemory <cutensornetNetworkSetInputTensorMemory-label>`, and 
:ref:`cutensornetNetworkSetOutputTensorMemory <cutensornetNetworkSetOutputTensorMemory-label>`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #7
   :end-before: Sphinx: #8


-------------------------------------------------------------
Tensor network contraction execution and gradient computation
-------------------------------------------------------------

Finally, we contract the tensor network via `cutensornetNetworkContract()`.
After contracting the network (which will store intermediate tensors' data in the CACHE memory), 
we prepare the gradient computation via `cutensornetNetworkPrepareGradientsBackward()` 
and compute the required gradients through backward propagation via `cutensornetNetworkComputeGradientsBackward()`.
We must purge the CACHE memory for each data set to allow the network contraction call to store the correct intermediate data in the CACHE memory.

Gradient and adjoint/activation buffers are attached using
:ref:`cutensornetNetworkSetGradientTensorMemory <cutensornetNetworkSetGradientTensorMemory-label>`, and 
:ref:`cutensornetNetworkSetAdjointTensorMemory <cutensornetNetworkSetAdjointTensorMemory-label>`.

See also 
:ref:`cutensornetNetworkPrepareGradientsBackward <cutensornetNetworkPrepareGradientsBackward-label>`,
:ref:`cutensornetNetworkComputeGradientsBackward <cutensornetNetworkComputeGradientsBackward-label>`, and
:ref:`cutensornetWorkspacePurgeCache <cutensornetWorkspacePurgeCache-label>`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #8
   :end-before: Sphinx: #9

--------------
Free resources
--------------

After the computation, we need to free up all resources.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #9