.. _cuTensorNet C example:

***************
Getting Started
***************

In this section, we show how to contract a tensor network using *cuTensorNet*. First, we describe how to install the library and how to compile a sample code.
Then, we present the example code used to perform common steps in *cuTensorNet*.
In this example, we perform the following tensor contraction:

.. math::

   D_{m,x,n,y} = A_{m,h,k,n} B_{u,k,h} C_{x,u,y}

We build the code up step by step, each step adding code at the end. The steps are separated by succinct multi-line comment blocks.

It is recommended that the reader refers to :doc:`overview` and `cuTENSOR documentation`_ for familiarity with the nomenclature and cuTENSOR operations.

.. _cuTENSOR documentation: https://docs.nvidia.com/cuda/cutensor/index.html

============================
Installation and Compilation
============================

Download the cuQuantum package (which *cuTensorNet* is part of) from https://developer.nvidia.com/cuQuantum-downloads,
and the cuTENSOR package from https://developer.nvidia.com/cutensor.

-----
Linux
-----

Assuming cuQuantum has been extracted in ``CUQUANTUM_ROOT`` and cuTENSOR in ``CUTENSOR_ROOT`` we update the library path as follows:

.. code-block:: bash

   export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/11:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ``${CUTENSOR_ROOT}/lib/11.0``).

The sample code discussed below (``tensornet_example.cu``) can be compiled via the following command:

.. code-block:: bash

   nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/11 -lcutensornet -lcutensor -o tensornet_example

For statically linking against the *cuTensorNet* library, use the following command (note that ``libmetis_static.a`` needs to be explicitly linked against, assuming it is installed through the NVIDIA CUDA Toolkit and accessible through ``$LIBRARY_PATH``):

.. code-block:: bash

   nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcutensornet_static.a -L${CUTENSOR_DIR}/lib/11 -lcutensor libmetis_static.a -o tensornet_example

.. note::

   Depending on the source of the cuQuantum package, you may need to replace ``lib`` above by ``lib64``.

..
   -------
   Windows
   -------
   
   Assuming *cuTensorNet* has been extracted in `CUTENSORNET_ROOT`, we update the library path accordingly:
   
   .. code-block:: bash
   
      setx LD_LIBRARY_PATH "%CUTENSORNET_ROOT%\lib:%CUTENSOR_ROOT%\lib:%LD_LIBRARY_PATH%"
   
   We can compile the sample code we will discuss below (`tensornet_example.cu`) via the following command:
   
   .. code-block:: bash
   
       nvcc.exe tensornet_example.cu /I "%CUTENSORNET_ROOT%\include" "%CUTENSOR_ROOT%\include" cuTensorNet.lib cuTensor.lib /out:tensornet_example.exe

============
Code Example
============

The following code example shows the common steps necessary to use *cuTensorNet* and performs some typical operations in tensor network contraction.
The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository (`here <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example.cu>`_).

----------------------
Headers and Data Types
----------------------

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #1
   :end-before: Sphinx: #2

--------------------------------------
Define Tensor Network and Tensor Sizes
--------------------------------------

Next, we define the topology of the tensor network (i.e., the modes of the tensors, their extents, and their connectivity).

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #2
   :end-before: Sphinx: #3
   
--------------------------------------
Allocate memory and initialize data
--------------------------------------

Next, we allocate memory for data and workspace, and randomly initialize the data.

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #3
   :end-before: Sphinx: #4

-----------------------------------------
cuTensorNet handle and Network Descriptor
-----------------------------------------

Next, we initialize the *cuTensorNet* library via `cutensornetCreate()` and
create the network descriptor with the desired tensor modes, extents, and 
strides, as well as the data and compute types.

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #4
   :end-before: Sphinx: #5

--------------------------------------
Optimal contraction order and slicing
--------------------------------------

At this stage, we can deploy the *cuTensorNet* optimizer to find an optimized contraction path and slicing combination. 
It is also possible to feed a pre-determined path into the configuration. 
We use the `cutensornetContractionOptimize()` function after creating a config and info structures, 
which allow us to specify various options to the optimizer.

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #5
   :end-before: Sphinx: #6

--------------------------------------
Contraction plan and auto-tune
--------------------------------------

We create a contraction plan holding pair-wise contraction plans for cuTENSOR. 
Optionally, we can auto-tune the plan such that cuTENSOR selects the best kernel for each contraction.
This contraction plan can be reused for many (possibly different) data inputs, avoiding
the cost of initializing this plan redundantly.
We also create a workspace descriptor, compute and query the minimum workspace size needed 
to contract the network, and provide this to the contraction plan.

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #6
   :end-before: Sphinx: #7

--------------------------------------
Network contraction execution
--------------------------------------

Finally, we contract the network as many times as needed, possibly with different data.
Network slices can be executed using the same contraction plan, potentially on different devices.
We also clean up and free allocated resources.

.. literalinclude:: ../../../tensor_network/samples/tensornet_example.cu
   :language: c++
   :linenos:
   :lineno-match:
   :start-after: Sphinx: #7

Recall that the full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository (`here <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example.cu>`_).

===========
Useful tips
===========

* For debugging, the environment variable ``CUTENSORNET_LOG_LEVEL=n`` can be set. The level ``n`` = 0, 1, ..., 5 corresponds to the logger level as described and used in `cutensornetLoggerSetLevel`. The environment variable ``CUTENSORNET_LOG_FILE=<filepath>`` can be used to direct the log output to a custom file at ``<filepath>`` instead of ``stdout``.