---------------------------------------------
Caching/Reusing constant intermediate tensors
---------------------------------------------

The following code example illustrates how to activate caching of constant intermediate tensors in order to greatly accelerate the repeated execution of a tensor network contraction where only some of the input tensors change their values in each iteration.
The full code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository (`here <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example_reuse.cu>`_).

----------------------
Headers and data types
----------------------

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #1
   :end-before: Sphinx: #2

--------------------------------------
Define tensor network and tensor sizes
--------------------------------------

Next, we define the structure of the tensor network (i.e., the modes of the tensors, their extents, and their connectivity).

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #2
   :end-before: Sphinx: #3

---------------------------------------------------------------
Allocate memory, initialize data, initialize cuTensorNet handle
---------------------------------------------------------------

Next, we allocate memory for the tensor network operands and initialize them to random values.
Then, we initialize the *cuTensorNet* library via `cutensornetCreate()`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #3
   :end-before: Sphinx: #4

-------------------------------------------------------
Mark constant tensors and create the network descriptor
-------------------------------------------------------

Next, we specify which input tensors are constant,
and create the network descriptor with the desired tensor modes, extents, 
strides, and qualifiers (e.g., constant) as well as the data and compute types. 
Note that the created library context will be associated with the currently active GPU.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #4
   :end-before: Sphinx: #5

-------------------------------------
Contraction order and slicing
-------------------------------------

In this example, we illustrate using a predetermined contraction path and
setting it into the optimizer info object via `cutensornetContractionOptimizerInfoSetAttribute`.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #5
   :end-before: Sphinx: #6

---------------------------------------------------------
Create workspace descriptor and allocate workspace memory
---------------------------------------------------------

Next, we create a workspace descriptor, compute the workspace sizes, and query the minimum workspace size needed
to contract the network. 
To activate intermediate tensor reuse, we need to provide CACHE workspace that will be used across multiple network contractions.
Thus, we query sizes and allocate device memory for both kinds of workspaces 
(`CUTENSORNET_WORKSPACE_SCRATCH`, and `CUTENSORNET_WORKSPACE_CACHE`) and set these in the workspace descriptor.
The workspace descriptor will be provided to the contraction plan creation and contraction APIs.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #6
   :end-before: Sphinx: #7

Note that, it is possible to skip the steps of creating a workspace descriptor and explicitly handling workspace memory
by setting a device memory handler, in which case *cuTensorNet* will implicitly handle workspace memory by allocating/deallocating memory from the provided memory pool.
See :ref:`cuTensorNet memory management API` for details.

------------------------------
Contraction plan and auto-tune
------------------------------

We create a tensor network contraction plan holding all pairwise tensor contraction plans for cuTENSOR. 
Optionally, we can auto-tune the plan such that cuTENSOR selects the best kernel for each pairwise contraction.
This contraction plan can be reused for many (possibly different) data inputs, avoiding
the cost of initializing this plan redundantly.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #7
   :end-before: Sphinx: #8


------------------------------------
Tensor network contraction execution
------------------------------------

Finally, we contract the tensor network as many times as needed, possibly with different input each time.
Note that the first network contraction call will utilize the provided CACHE workspace to store the constant
intermediate tensors. Subsequent network contractions will use the cached data to greatly reduce the computation times.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #8
   :end-before: Sphinx: #9

--------------
Free resources
--------------

After the computation, we need to free up all resources.

.. literalinclude:: ../../../../tensor_network/samples/tensornet_example_reuse.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #9