.. _cuDensityMat C++ examples:

********
Examples
********

In this section, we show basic examples on how to define a quantum operator, quantum state,
and then compute the action of the quantum operator on a quantum state, and, optionally,
backward-differentiate the operator action (compute gradients) with respect to user-provided
real parameters parameterizing the operator. For clarity, the quantum operator for each example
is defined inside a separate C++ header, specifically ``transverse_ising_full_fused_noisy.h``
and ``transverse_ising_full_fused_noisy_grad.h``, where it is wrapped in a helper C++ class
``UserDefinedLiouvillian``. We also provide a utility header ``helpers.h`` containing
convenient GPU array instantiation, initialization, and printing functions.

==============
Compiling code
==============

Assuming **cuQuantum** has been extracted in ``CUQUANTUM_ROOT``
and **cuTENSOR** is in ``CUTENSOR_ROOT``, we update the library path as follows:

.. code-block:: bash

   export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ``${CUTENSOR_ROOT}/lib/11``).

A serial sample code discussed below (``operator_action_example.cpp``) can be compiled via the following command:

.. code-block:: bash

   nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensornet -lcutensor -lcublas -o operator_action_example

For static linking against the **cuDensityMat** library, use the following command:

.. code-block:: bash

   nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a ${CUQUANTUM_ROOT}/lib/libcutensornet_static.a -L${CUTENSOR_ROOT}/lib/12 -lcutensor -lcublas -o operator_action_example

In order to build a parallel (MPI) version of the example ``operator_action_mpi_example.cpp``,
one will need to have a *CUDA-aware* MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and
then set the environment variable ``$CUDENSITYMAT_COMM_LIB`` to the path to the MPI interface
wrapper shared library ``libcudensitymat_distributed_interface_mpi.so``.
The MPI interface wrapper shared library ``libcudensitymat_distributed_interface_mpi.so`` can be built
inside the ``${CUQUANTUM_ROOT}/distributed_interfaces`` folder by calling the provided build script there.
In order to link the executable to a CUDA-aware MPI library, one will need to add
``-I${MPI_PATH}/include`` and ``-L${MPI_PATH}/lib -lmpi`` to the build command:

.. code-block:: bash

   nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -L${MPI_PATH}/lib -lcudensitymat -lcutensornet -lcutensor -lcublas -lmpi -o operator_action_mpi_example

.. warning::

   When running ``operator_action_mpi_example.cpp`` with a non-CUDA-aware MPI library, the program will crash.

.. note::

   Depending on the installation of the **cuQuantum SDK** package, you may need to replace ``lib`` above by ``lib64``,
   depending which folder name is used inside your **cuQuantum SDK** package.

===============================================
Code example (serial execution on a single GPU)
===============================================

The following code example illustrates the common steps necessary to use the **cuDensityMat** library
to compute the action of a quantum many-body operator on a quantum state.
The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository
(`main serial code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/operator_action_example.cpp>`_ and
`operator definition <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/transverse_ising_full_fused_noisy.h>`_
as well as the `utility code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/helpers.h>`_).

First let's introduce a helper class to construct a specific quantum many-body operator,
for example, the transverse field Ising Hamiltonian with fused ``ZZ`` terms and
an additional noise term. Here we choose to make the ``f(t)`` coefficient depend on time
and a single user-provided real parameter `Omega`. We use a CPU-side user-defined
scalar callback function to define the dependence of the ``f(t)`` coefficient on time
and the user-provided real parameter ``Omega``. Note that inside the callback function
definition, we explicitly expect the data type to be ``CUDA_C_64F`` (double-precision complex numbers),
which applies to the scalar coefficient ``f(t)`` set by the callback function in-place.

.. literalinclude:: ../../../density_matrix/examples/transverse_ising_full_fused_noisy.h
   :language: c++
   :linenos:
   :lineno-match:

Now we can use this parameterized quantum many-body operator in our main code
to compute the action of the operator on a mixed quantum state.

.. literalinclude:: ../../../density_matrix/examples/operator_action_example.cpp
   :language: c++
   :linenos:
   :lineno-match:

==================================================
Code example (parallel execution on multiple GPUs)
==================================================

It is straightforward to adapt the `main serial code`_ and enable parallel execution
across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with
an example using the Message Passing Interface (MPI) as the communication layer. Below we show
the minor additions that need to be made in order to enable distributed parallel execution
without making any changes to the original serial source code.

The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository
(`main MPI code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/operator_action_mpi_example.cpp>`_ and
`operator definition`_
`as well as the utility code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/helpers.h>`_).

Here is the updated main code for multi-GPU runs.

.. literalinclude:: ../../../density_matrix/examples/operator_action_mpi_example.cpp
   :language: c++
   :linenos:
   :lineno-match:

=============================================================
Code example (serial execution with backward differentiation)
=============================================================

The following code example illustrates how to use the **cuDensityMat** library
to not only compute the action of a quantum many-body operator on a quantum state,
but also backward-differentiate it (compute gradients) with respect to user-provided
real parameters parameterizing the operator (one real parameter Omega in this example).
The full sample code can be found in the `NVIDIA/cuQuantum <https://github.com/NVIDIA/cuQuantum>`_ repository
(`main serial gradient code <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/operator_action_gradient_example.cpp>`_ and
`operator definition for gradient <https://github.com/NVIDIA/cuQuantum/blob/main/samples/cudensitymat/transverse_ising_full_fused_noisy_grad.h>`_
as well as the `utility code`_).

First let's construct a specific quantum many-body operator which, in this case,
is a slightly modified version of the quantum many-body operator used in `main serial code`_.
Here we make both the ``h(t)`` and ``f(t)`` scalar coefficients depend on time and a single
user-provided real parameter ``Omega`` via different (made-up) functional forms.
In order to backward-differentiate the operator action with respect to this single
user-provided real parameter ``Omega``, we need to manually define a gradient callback
function for each regular callback function we have (for ``h(t)`` and ``f(t)`` in this example).
In our example, we define two CPU-side scalar gradient callback functions which compute
the vector-jacobian product (VJP) of the scalar adjoint of ``h(t)`` and ``f(t)`` with respect
to the user-provided real parameter ``Omega``, respectively. A gradient callback function
is expected to accumulate the VJP result into the ``paramsGrad`` output array, the final
value of which will be the gradient(s) of the user-defined cost function with respect to
the user-provided real parameters parameterizing the operator. As before, all regular and
gradient callback functions used in our example explicitly expect the data type to be
``CUDA_C_64F`` (double-precision complex numbers). Updating the data type of the operator
and quantum states would require updating the callback functions accordingly (``CUDA_C_32F``).

.. literalinclude:: ../../../density_matrix/examples/transverse_ising_full_fused_noisy_grad.h
   :language: c++
   :linenos:
   :lineno-match:

Now we can use the defined quantum many-body operator in our main code to compute its
action on a mixed quantum state and then backward-differentiate it (compute gradients)
with respect to the user-provided real parameter ``Omega``. For the sake of simplicity,
we pass a made-up adjoint of the output quantum state to the `cudensitymatOperatorComputeActionBackwardDiff`
call, which is just the output quantum state itself (in real scenarios, the adjoint of the output quantum
state will depend on the user-chosen cost function and will be provided by the user).
Upon completion of the `cudensitymatOperatorComputeActionBackwardDiff` call, the ``paramsGrad``
output argument will contain the gradient of the user-defined cost function with respect to
the user-provided real parameter ``Omega``. Additionally, the backward-differentiation API call
will also return the adjoint of the input quantum state for cases where the input quantum
state implicitly depends on the user-provided real parameters (for example, cases where
the input quantum state comes from a previous operator action step, which is typical for
time-integration of quantum dynamics master equations). Note that both output arguments,
namely ``paramsGrad`` and ``stateInAdj``, are accumulative, i.e., they will be accumulated into
(it is user's responsibility to zero them out before the first call).

.. literalinclude:: ../../../density_matrix/examples/operator_action_gradient_example.cpp
   :language: c++
   :linenos:
   :lineno-match:

===========
Useful tips
===========

* For debugging, one can set the environment variable ``CUDENSITYMAT_LOG_LEVEL=n``.
  The level ``n`` = 0, 1, ..., 5 corresponds to the logger level as described in the table below.
  The environment variable ``CUDENSITYMAT_LOG_FILE=<filepath>`` can be used to redirect the log output to a custom file
  at ``<filepath>`` instead of ``stdout``.

.. list-table::
   :widths: 10 25 65

   * - **Level**
     - **Summary**
     - **Long Description**
   * - 0
     - Off
     - Logging is disabled (default)
   * - 1
     - Errors
     - Only errors will be logged
   * - 2
     - Performance Trace
     - API calls that launch CUDA kernels will log their parameters and important information
   * - 3
     - Performance Hints
     - Hints that can potentially improve the application's performance
   * - 4
     - Heuristics Trace
     - Provides general information about the library execution, may contain details about heuristic status
   * - 5
     - API Trace
     - API calls will log their parameter and important information