.. _cuDensityMat C++ example:

********
Examples
********

In this section, we show an example on how to define a quantum operator, quantum states,
and then compute the action of the quantum operator on a quantum state. For clarity,
the quantum operator is defined inside a separate header ``transverse_ising_full_fused_noisy.h``
where it is wrapped in a helper C++ class ``UserDefinedLiouvillian``. We also provide
a utility header ``helpers.h`` containing convenient GPU array instantiation functions.

==============
Compiling code
==============

Assuming cuQuantum has been extracted in ``CUQUANTUM_ROOT`` and cuTENSOR in ``CUTENSOR_ROOT``, we update the library path as follows:

.. code-block:: bash

   export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ``${CUTENSOR_ROOT}/lib/11``).

The serial sample code discussed below (``operator_action_example.cpp``) can be compiled via the following command:

.. code-block:: bash

   nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example

For static linking against the **cuDensityMat** library, use the following command:

.. code-block:: bash

   nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example

In order to build parallel (MPI) versions of the example ``operator_action_mpi_example.cpp``,
one will need to have an *CUDA-aware* MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then 
set the environment variable ``$CUDENSITYMAT_COMM_LIB`` to the path to the MPI interface wrapper shared library ``libcudensitymat_distributed_interface_mpi.so``.
The MPI interface wrapper shared library ``libcudensitymat_distributed_interface_mpi.so`` can be built
inside the ``${CUQUANTUM_ROOT}/distributed_interfaces`` folder by calling the provided build script there.
In order to link the executable to a CUDA-aware MPI library, one will need to add ``-I${MPI_PATH}/include`` and ``-L${MPI_PATH}/lib -lmpi`` to the build command:

.. code-block:: bash

   nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example

.. warning::

   When running ``operator_action_mpi_example.cpp`` without CUDA-aware MPI, the program will crash.

.. note::

   Depending on the source of the cuQuantum package, you may need to replace ``lib`` above by ``lib64``,
   depending which folder name is used inside your cuQuantum package.

===============================================
Code example (serial execution on a single GPU)
===============================================

The following code example illustrates the common steps necessary to use the **cuDensityMat** library
to compute the action of a quantum many-body operator on a quantum state.
The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository
(`main serial code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/operator_action_example.cpp>`_ and
`operator definition <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/transverse_ising_full_fused_noisy.h>`_
`as well as the utility code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/helpers.h>`_).

First let's introduce a helper class to construct a quantum many-body operator,
for example, the transverse field Ising Hamiltonian with fused ZZ terms and
and additional noise term.

.. literalinclude:: ../../../density_matrix/examples/transverse_ising_full_fused_noisy.h
   :language: c++
   :linenos:
   :lineno-match:

Now we can use this quantum many-body operator in our main code.

.. literalinclude:: ../../../density_matrix/examples/operator_action_example.cpp
   :language: c++
   :linenos:
   :lineno-match:

==================================================
Code example (parallel execution on multiple GPUs)
==================================================

It is straightforward to adapt `main serial code`_ and enable parallel execution
across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with
an example using the Message Passing Interface (MPI) as the communication layer. Below we show
the minor additions that need to be made in order to enable distributed parallel execution
without making any changes to the original serial source code.

The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository
(`main MPI code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/operator_action_mpi_example.cpp>`_ and
`operator definition <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/transverse_ising_full_fused_noisy.h>`_
`as well as the utility code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/helpers.h>`_).

Here is the updated main code for multi-GPU runs.

.. literalinclude:: ../../../density_matrix/examples/operator_action_mpi_example.cpp
   :language: c++
   :linenos:
   :lineno-match:

===========
Useful tips
===========

* For debugging, one can set the environment variable ``CUDENSITYMAT_LOG_LEVEL=n``.
  The level ``n`` = 0, 1, ..., 5 corresponds to the logger level as described in the table below.
  The environment variable ``CUDENSITYMAT_LOG_FILE=<filepath>`` can be used to redirect the log output to a custom file
  at ``<filepath>`` instead of ``stdout``.

.. list-table::
   :widths: 10 25 65

   * - **Level**
     - **Summary**
     - **Long Description**
   * - 0
     - Off
     - Logging is disabled (default)
   * - 1
     - Errors
     - Only errors will be logged
   * - 2
     - Performance Trace
     - API calls that launch CUDA kernels will log their parameters and important information
   * - 3
     - Performance Hints
     - Hints that can potentially improve the application's performance
   * - 4
     - Heuristics Trace
     - Provides general information about the library execution, may contain details about heuristic status
   * - 5
     - API Trace
     - API calls will log their parameter and important information