.. _high-level densitymat APIs:

.. currentmodule:: cuquantum.densitymat

******************************
High-level Density Matrix APIs
******************************

The pythonic API closely follows the structure of the Density Matrix C-API, providing separate objects for different concepts such as quantum states and operators.

WorkStream
==========
At the base of the pythonic API is the :class:`WorkStream` object which aggregates some configuration settings, 
cuDensityMat library handle, CUDA stream, CUDA device and workspace buffer into a single object.
The workspace buffer is allocated automatically as required by the other API objects interacting with the :class:`WorkStream` instance.
An upper limit to the allocated workspace buffer size can be specified at the creation of the :class:`WorkStream` instance.
The workspace buffer is not released until an explicit call to the :meth:`WorkStream.release_workspace` method is made.
In general usage, only a single :class:`WorkStream` per device should be created. Other API objects can only interact with one :class:`WorkStream` for their lifetime.
Currently only blocking operation is supported, i.e. API calls return only after completion of the task on the device.

.. code-block:: python

    from cuquantum.densitymat import WorkStream
    ctx = WorkStream(device_id=0)

Multi-GPU multi-node execution is supported by passing a communicator to the :class:`WorkStream` via its :meth:`WorkStream.set_communicator` method.
Note that this step has to be performed before the :class:`WorkStream` is passed to any other API object.
Please follow the instructions in :ref:`cuDensityMat MPI setup <cudensitymat MPI setup>` to set environment variables for MGMN execution. 
Apart from setting the communicator, the only place where multi-GPU execution has any consequence is for specifying quantum states, the coefficients of which will be distributed over all GPUs.

.. code-block:: python
    
    from mpi4py import MPI
    import cupy as cp
    from cuquantum.densitymat import WorkStream

    comm = MPI.COMM_WORLD.Dup()
    rank = comm.Get_rank()
    num_devices = cp.cuda.runtime.getDeviceCount()
    dev_id = rank % num_devices
    cp.cuda.Device(dev_id).use()
    ctx = WorkStream(device_id=dev_id)
    ctx.set_communicator(comm, provider="MPI")

Tensor Spaces and Quantum States
================================
A *tensor space* is a composite linear space constructed as a tensor product of one or more
vector spaces. Each constituting vector space represents a *quantum degree of freedom*
of dimension equal to the dimension of the vector space. The *shape* of the composite
tensor space is a tuple of the dimensions of the constituting vector spaces in their
original order. Note that the shape of the tensor space stays the same irrespective
of whether we deal with pure states (vectors living in the tensor space) or mixed
states (matrices associated with the same tensor space).

A *pure quantum state* is represented by a tensor from the specified tensor space,
thus the shape of a pure quantum state coincides with the tensor space shape.
A *mixed quantum state* is represented by a tensor belonging to the tensor space
constructed as a tensor product of the specified tensor space (primary space or ket space)
and its dual tensor space (dual space or bra space). Thus, the shape of a mixed quantum
state is a concatenation of two tensor space shapes, where the first half of the tensor
modes corresponds to the ket space while the second half of the tensor modes corresponds
to the bra space. For example, if tensor :math:`V[i0, i1, i2]` represents a pure quantum
state from a tensor space constructed from three vector spaces (quantum degrees of freedom),
then tensor :math:`D[i0, i1, i2; j0, j1, j2]` represents a mixed quantum state from the same
tensor space, where modes :math:`i0, i1, i2` are the ket modes while modes :math:`j0, j1, j2`
are the bra modes. The corresponding tensor space shape in both cases is
:math:`[range(i0), range(i1), range(i2)]`.

In the pythonic API, a pure (mixed) quantum state is represented by an instance of :class:`DensePureState` (:class:`DenseMixedState`).
Below, we use the dense pure state :class:`DensePureState` class as an example to provide detailed instructions.
The workflow for dense mixed states is identical, and the two classes have identical methods and attributes.

In order to create a quantum state, we pass a :class:`WorkStream`, the Hilbert space dimensions, batch size and datatype:

.. code-block:: python

    import cupy as cp
    from cuquantum.densitymat import DensePureState

    hilbert_space_dims = (2, 6, 4, 3)
    batch_size = 1
    dtype = "complex64"
    psi_1 = DensePureState(ctx, hilbert_space_dims, batch_size, dtype)


The created quantum state has no associated physical storage yet.

In order to attach physical storage to a quantum state, there are two options:

.. code-block:: python

    psi_1.allocate_storage()

:meth:`DensePureState.allocate_storage` will allocate a :class:`cupy.ndarray` of the correct datatype with ``psi1.storage_size`` elements, initialized to zero.

.. code-block:: python

    size = psi_1.storage_size
    psi_1.attach_storage(cp.zeros(size, dtype = psi_1.dtype))

:meth:`DensePureState.attach_storage` allows the user to pass a :class:`cupy.ndarray` with :attr:`DensePureState.storage_size` elements located on the same device as the one passed to :class:`WorkStream`.
For single-GPU calculations, the storage size corresponds to the number of coefficients of the state, and the array passed can be multidimensional.
In that case, the array needs to be Fortran-ordered, contiguous, and matching the Hilbert space dimensions and batch size.

For a multi-GPU setup, each process only stores a subset of state coefficients, which we will refer to as a *slice*.
The size of the slice may be equal or smaller than the storage buffer size.
Only one-dimensional storage buffers are allowed in multi-GPU execution.
The slice is always located in the first :attr:`DensePureState.storage_size` elements of the storage buffer.

:attr:`DensePureState.storage` returns the attached storage buffer. For states with a total of fewer than 32 modes, a multidimensional, writeable view on the slice is returned by :meth:`DensePureState.view`. For larger systems, this method will fail due to :class:`cupy.ndarray` not supporting more than 32 dimensions.
:attr:`DensePureState.local_info` returns the shape of the local slice as well as the offsets along each dimension.
This information is required to interpret the state's coefficients or modify them to, for example, specify an initial state.
Note that the batch size is always reported as the last mode, even if the batch size is 1.

For scripts purely intended for single-GPU execution, both :attr:`DensePureState.storage_size` and :attr:`DensePureState.local_info` directly follow from the Hilbert space dimensions and batch size, and the user may omit these steps.
Note that by passing a one-dimensional array of size :attr:`DensePureState.storage_size`, user scripts will be directly portable between single-GPU and multi-GPU execution.

For states with the same Hilbert space dimensions and batch size, a convenient way of creating a new instance and attaching storage directly is provided via :meth:`DensePureState.clone`.

.. code-block:: python

    psi_2 = psi_1.clone(cp.zeros_like(psi_1.storage))

Besides direct manipulation of the storage buffer by the user, there are several methods which allow element-wise operations, :meth:`DensePureState.inplace_scale` and :meth:`DensePureState.inplace_accumulate`.
Unary and binary reductions on :class:`DensePureState` and :class:`DenseMixedState` instances are provided via :meth:`DensePureState.norm`, :meth:`DensePureState.inner_product` and :meth:`DensePureState.trace`.

Quantum Many-Body Operators (Super-Operators)
=============================================
A broad class of quantum Liouvillian superoperators, or quantum
many-body operators in general, can be expressed as a sum
of tensor products of elementary tensor operators with some,
generally time-dependent coefficients, where an *elementary tensor operator*
is a tensor operator acting on a specific number of quantum degrees of freedom,
either from the right or from the left, or both. Each elementary tensor operator
has the same number of the ket and bra modes, with the ket modes preceding
the bra modes in its tensor. The bra modes of the elementary tensor operator
act on the ket modes of the quantum state (action from the left) while the ket
modes of the elementary tensor operator act on the bra modes of the quantum
state (action from the right). In practice, many quantum Liouvillian
operators are expressed in terms of tensor products of the one-body
elementary tensor operators, like creation/annihilation operators,
particle number operator, coordinate/momentum operators,
individual spin operators, etc. For the sake of concreteness,
let's consider the following quantum Liouvillian operator
as an example:

.. math::

    L = \omega_{c} b^{\dagger} b 
        + \frac{1}{2} \omega_{a} \sigma_{z}
        + [ \frac{1}{2} g(t, \alpha, \beta) b^{\dagger} \sigma_{-}
        +   \frac{1}{2} g(t, \alpha, \beta) b \sigma_{+} ]
        + \gamma \sigma_{-}

In the above Liouvillian operator, :math:`b`, :math:`b^{\dagger}`, :math:`\sigma_{z}`,
:math:`\sigma_{-}`, :math:`\sigma_{+}` are elementary tensor operators,
and each of them acts only on a single quantum degree of freedom,
represented by an order-2 tensor :math:`U[i0; j0]`, where :math:`i0` is the ket
mode and :math:`j0` is the bra mode (an order-`k` elementary tensor operator
is repsented by a tensor with :math:`k` ket modes and :math:`k` bra modes).
:math:`\omega_{c}`, :math:`\omega_{a}`, and :math:`\gamma` are constant scalars.
:math:`g(t, \alpha, \beta)` is a time-dependent scalar coefficient that
depends on time and two real parameters, :math:`\alpha` and :math:`\beta`,
which parameterize the Liouvillian operator.

Elementary operators
--------------------
Tensors that define elementary operators can be provided directly as :class:`cupy.ndarray` (on the :class:`WorkStream`\ 's device) or :class:`numpy.ndarray`. 
The pythonic API also provides :class:`DenseOperator` and :class:`MultiDiagonalOperator` objects.
:class:`MultiDiagonalOperator`\ s are operators acting on a single mode, specified in sparse DIA format. Multidiagonal tensors (i.e. covering more than a single mode) and multidiagonal matrices with diagonals above the first superdiagonal or below the first subdiagonal are not supported currently.
:class:`DenseOperator`\ s have underlying dense storage. Both objects allow selecting unary and binary operations and also support passing in a CPU callback function with signature ``(t: float, args: Sequence) -> numpy.ndarray`` that sets the tensor storage's elements dynamically.
Regardless of whether the tensors are passed as :class:`cupy.ndarray` / :class:`numpy.ndarray` or as :class:`DenseOperator` / :class:`MultidiagonalOperator`, a copy of the originally provided array will be made.
If the tensor data is originally provided as a :class:`numpy.ndarray`, it will be moved to GPU (the :class:`WorkStream`\ 's device) the first time the elementary operator is being used as part of an :class:`Operator` or :class:`OperatorAction`.
If an elementary operator appears multiple times in the expression for the quantum many-body operator, using the :class:`DenseOperator` and :class:`MultiDiagonalOperator` allows us to reuse the same instance and is thus preferred.

Composing symbolic expressions with elementary operators
--------------------------------------------------------
Individual elementary operators can be composed into products and sums thereof. The API provides several layers for such products and sums which are composed via aggregation.

Products of elementary operators are most conveniently composed by the :func:`tensor_product` free function.
The arguments to tensor products are tuples containing the elementary operator, a sequence of modes it's supported on, and optionally a sequence of booleans ("dualities") which denotes whether the operator contracts with the ket or bra modes of the state (for mixed states, defaulting to ``False``, i.e. ket modes). Furthermore, a coefficient can be supplied via a named argument.

A tensor product of operators is created via :func:`tensor_product`. This function
returns an :class:`OperatorTerm` object, which represents a sum of products of elementary tensor operators.
:class:`OperatorTerm`\ supports in-place (``+=`` and :meth:`OperatorTerm.append` method) and out-place addition (``+``) to compose the sum, as
well as multiplication by a :class:`numbers.Number` or `Callable`. Note that all `Callable`\ s used (for coefficients or elementary operator callback functions) have a signature of ``(t, args)``
and must accept the same arguments. Inplace addition is only supported before the :class:`OperatorTerm` starts using a :class:`WorkStream` (see below).
Additionally, multiplication of :class:`OperatorTerm`\ s is supported, creating a new instance which contains the product of sums of products of the factors, with the number of terms in the sum being equal to the product of the number of terms in the original :class:`OperatorTerm`\ s.
We would like to note that the computation time scales roughly linearly in the number of products in the :class:`OperatorTerm`, and thus advise against using multiplication of :class:`OperatorTerm`\ s if a simplification can be performed manually instead.

An :class:`Operator` contains a sum of :class:`OperatorTerm`\ s, and each :class:`OperatorTerm` is associated with a coefficient (defaults to 1) and a boolean *duality* (defaults to `False`).
If *duality* is `True`, the action on bra and ket modes as specified for each product in the :class:`OperatorTerm` is flipped.
:meth:`Operator.dual` returns a new instance with these boolean arguments flipped, which is useful to express e.g. commutators.
:class:`Operator`\ s support out-of-place addition and substraction with other :class:`Operator` instances, as well as in-place (``+=``, :meth:`Operator.append` method) and out-of-place (``+``) addition of :class:`OperatorTerm`.
In-place addition is only supported before the :class:`Operator`\ starts using a :class:`WorkStream` (see below). 
Furthermore, scalar multiplication by a :class:`numbers.Number` or `Callable` is allowed.

The action of an :class:`Operator` on a quantum state ``state_in`` accumulated into a quantum state ``state_out`` can be computed via :meth:`Operator.compute_action`. 
Expectation value on a quantum state can be evaluated via :meth:`Operator.compute_expectation`.
Before performing these calls, :meth:`Operator.prepare_action` (resp. :meth:`Operator.prepare_expectation`) should be invoked once, which remains valid for the lifetime of the object.

While the concept of a Liouvillian superoperator and its action
on a quantum state is sufficient to specify the right-hand side
of the broadly used master equations, like the Lindblad equation,
in a more general setting one may need to be able to specify
a *system of coupled ordinary differential equations* (ODE)
to simultaneously evolve multiple (potentially batched) quantum
states under their own Liouvillian operators, as
in the hierarchical equations of motion (HEOM) method.

Such an ODE can be expressed by :class:`OperatorAction`. An :class:`OperatorAction` contains a sequence of :class:`Operator`, which can be applied to a sequence of quantum states via :meth:`OperatorAction.compute`, where the result is accumulated into a single output state.

Note that issuing a :meth:`Operator.prepare_action` / :meth:`Operator.prepare_expectation` call on :class:`Operator` or creating an :class:`OperatorAction` can have side effects
on all objects that they depend on (directly and indirectly). In particular the object and its dependencies
will be using the :class:`WorkStream` instance passed in, which is not reversible as explained above.
If the dependent objects are already using a different :class:`WorkStream` instance, an error will be raised. 
Generally we advise against using more than a single :class:`WorkStream` instance per GPU.


Usage examples
==============
The following example illustrates various ways to define elementary operators and compose them into :class:`OperatorTerm`, :class:`Operator`, :class:`OperatorAction`.
It is written for single-GPU execution, although enabling multi-GPU execution is straightforward as shown in the next example. 

.. literalinclude:: ../../../python/samples/densitymat/operator_advanced.py
   :language: python

The following is a minimal working example for multi-GPU multi-node (MGMN) execution.
It includes a utility function to set the quantum state to a specific product state in the computational basis, 
which illustrates how to interpret :attr:`DensePureState.local_info` and work with the one-dimensional quantum state storage buffer required for MGMN execution.

.. literalinclude:: ../../../python/samples/densitymat/operator_mpi.py
   :language: python