--------------------------------------------
Computing gradients via backward propagation
--------------------------------------------

The following code example illustrates how to compute the gradients of a tensor network w.r.t. user-selected input tensors via backward propagation.
The full code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository (`here <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example_gradients.cu>`_).

----------------------
Headers and data types
----------------------

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #1
   :end-before: Sphinx: #2

--------------------------------------
Define tensor network and tensor sizes
--------------------------------------

Next, we define the structure of the tensor network (i.e., the modes of the tensors, their extents, and their connectivity),
and specify the input tensor IDs whose gradients will be computed.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #2
   :end-before: Sphinx: #3

---------------------------------------------------------------
Allocate memory, initialize data, initialize cuTensorNet handle
---------------------------------------------------------------

Next, we allocate memory for the tensor network operands and initialize them to random values.
We also allocate memory for the gradient tensors corresponding to the selected input tensors for gradient computation, 
as well as the activation tensor which we initialize to ones.
Then, we initialize the *cuTensorNet* library via `cutensornetCreate()`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #3
   :end-before: Sphinx: #4

---------------------------------------------------------
Create the network descriptor and set gradient tensor IDs
---------------------------------------------------------

Next, we create the network descriptor with the desired tensor modes, extents, 
and strides, as well as the data and compute types. 
We also set input tensors IDs whose gradients will be computed, at the network descriptor.
Note that the created library context will be associated with the currently active GPU.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #4
   :end-before: Sphinx: #5

-----------------
Contraction order
-----------------

In this example, we illustrate using a predetermined contraction path and
setting it into the optimizer info object via `cutensornetContractionOptimizerInfoSetAttribute`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #5
   :end-before: Sphinx: #6

---------------------------------------------------------
Create workspace descriptor and allocate workspace memory
---------------------------------------------------------

Next, we create a workspace descriptor, compute the workspace sizes, and query the minimum workspace size needed
to contract the network. 
To enable gradient computation, we need to provide CACHE workspace that will be used to store intermediate tensors' data 
necessary for the backward propagation call to consume.
Thus, we query sizes and allocate device memory for both kinds of workspaces 
(`CUTENSORNET_WORKSPACE_SCRATCH`, and `CUTENSORNET_WORKSPACE_CACHE`) and set these in the workspace descriptor.
The workspace descriptor will be provided to the contraction plan creation, plan contraction, and gradient computation APIs.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #6
   :end-before: Sphinx: #7

------------------------------
Contraction plan and auto-tune
------------------------------

We create a tensor network contraction plan holding all pairwise tensor contraction plans for cuTENSOR. 
Optionally, we can auto-tune the plan such that cuTENSOR selects the best kernel for each pairwise contraction.
This contraction plan can be reused for many (possibly different) data inputs, avoiding
the cost of initializing this plan redundantly.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #7
   :end-before: Sphinx: #8


-------------------------------------------------------------
Tensor network contraction execution and gradient computation
-------------------------------------------------------------

Finally, we contract the tensor network as many times as needed, possibly with different input each time.
After contracting the network (which will store intermediate tensors' data in the CACHE memory), 
we compute the required gradients via backward propagation.
We also purge the CACHE memory at each iteration to make room for subsequent calls to utilize available CACHE memory.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #8
   :end-before: Sphinx: #9

--------------
Free resources
--------------

After the computation, we need to free up all resources.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_gradients.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #9