.. _high-level densitymat APIs:

.. currentmodule:: cuquantum.densitymat

*********************
Quantum Dynamics APIs
*********************

The pythonic API in :mod:`cuquantum.densitymat` module closely follows the structure of the cuDensityMat C APIs, providing separate objects for different concepts such as quantum states and operators.

WorkStream
==========
At the base of the pythonic API is the :class:`WorkStream` object which aggregates some configuration settings, 
cuDensityMat library handle, CUDA stream, CUDA device and workspace buffer into a single object.
The workspace buffer is allocated automatically as required by the other API objects interacting with the :class:`WorkStream` instance.
An upper limit to the allocated workspace buffer size can be specified at the creation of the :class:`WorkStream` instance.
The workspace buffer is not released until an explicit call to the :meth:`WorkStream.release_workspace` method is made.
In general usage, only a single :class:`WorkStream` per device should be created. Other API objects can only interact with one :class:`WorkStream` for their lifetime.
Currently only blocking operation is supported, i.e. API calls return only after completion of the task on the device.

.. code-block:: python

    from cuquantum.densitymat import WorkStream
    ctx = WorkStream(device_id=0)

Multi-GPU multi-node execution is supported by passing a communicator to the :class:`WorkStream` via its :meth:`WorkStream.set_communicator` method.
Note that this step has to be performed before the :class:`WorkStream` is passed to any other API object.
Please follow the instructions in :ref:`cuDensityMat MPI setup <cudensitymat MPI setup>` to set environment variables for MGMN execution. 
Apart from setting the communicator, the only place where multi-GPU execution has any consequence is for specifying quantum states, the coefficients of which will be distributed over all GPUs.

.. code-block:: python
    
    from mpi4py import MPI
    import cupy as cp
    from cuquantum.densitymat import WorkStream

    comm = MPI.COMM_WORLD.Dup()
    rank = comm.Get_rank()
    num_devices = cp.cuda.runtime.getDeviceCount()
    dev_id = rank % num_devices
    cp.cuda.Device(dev_id).use()
    ctx = WorkStream(device_id=dev_id)
    ctx.set_communicator(comm, provider="MPI")

Tensor spaces and quantum states
================================
A *tensor space* is a composite linear space constructed as a tensor product of one or more
vector spaces. Each constituting vector space represents a *quantum degree of freedom*
of dimension equal to the dimension of the vector space. The *shape* of the composite
tensor space is a tuple of the dimensions of the constituting vector spaces in their
original order. Note that the shape of the tensor space stays the same irrespective
of whether we deal with pure states (vectors living in the tensor space) or mixed
states (matrices associated with the same tensor space).

A *pure quantum state* is represented by a tensor from the specified tensor space,
thus the shape of a pure quantum state coincides with the tensor space shape.
A *mixed quantum state* is represented by a tensor belonging to the tensor space
constructed as a tensor product of the specified tensor space (primary space or ket space)
and its dual tensor space (dual space or bra space). Thus, the shape of a mixed quantum
state is a concatenation of two tensor space shapes, where the first half of the tensor
modes corresponds to the ket space while the second half of the tensor modes corresponds
to the bra space. For example, if tensor :math:`V[i0, i1, i2]` represents a pure quantum
state from a tensor space constructed from three vector spaces (quantum degrees of freedom),
then tensor :math:`D[i0, i1, i2; j0, j1, j2]` represents a mixed quantum state from the same
tensor space, where modes :math:`i0, i1, i2` are the ket modes while modes :math:`j0, j1, j2`
are the bra modes. The corresponding tensor space shape in both cases is
:math:`[range(i0), range(i1), range(i2)]`.

In the pythonic API, a pure (mixed) quantum state is represented by an instance of :class:`DensePureState` (:class:`DenseMixedState`).
Below, we use the dense pure state :class:`DensePureState` class as an example to provide detailed instructions.
The workflow for dense mixed states is identical, and the two classes have identical methods and attributes.

In order to create a quantum state, we pass a :class:`WorkStream`, the Hilbert space dimensions, batch size and datatype:

.. code-block:: python

    import cupy as cp
    from cuquantum.densitymat import DensePureState

    hilbert_space_dims = (2, 6, 4, 3)
    batch_size = 1
    dtype = "complex64"
    psi_1 = DensePureState(ctx, hilbert_space_dims, batch_size, dtype)


The created quantum state has no associated physical storage yet.

In order to attach physical storage to a quantum state, there are two options:

.. code-block:: python

    psi_1.allocate_storage()

:meth:`DensePureState.allocate_storage` will allocate a :class:`cupy.ndarray` of the correct datatype with ``psi_1.storage_size`` elements, initialized to zero.

.. code-block:: python

    size = psi_1.storage_size
    psi_1.attach_storage(cp.zeros(size, dtype = psi_1.dtype))

:meth:`DensePureState.attach_storage` allows the user to pass a :class:`cupy.ndarray` with :attr:`DensePureState.storage_size` elements located on the same device as the one passed to :class:`WorkStream`.
For single-GPU calculations, the storage size corresponds to the number of coefficients of the state, and the array passed can be multidimensional.
In that case, the array needs to be Fortran-ordered, contiguous, and matching the Hilbert space dimensions and batch size.

For a multi-GPU setup, each process only stores a subset of state coefficients, which we will refer to as a *slice*.
The size of the slice may be equal or smaller than the storage buffer size.
Only one-dimensional storage buffers are allowed in multi-GPU execution.
The slice is always located in the first :attr:`DensePureState.storage_size` elements of the storage buffer.

:attr:`DensePureState.storage` returns the attached storage buffer. For states with a total of fewer than 32 modes, a multidimensional, writable view on the slice is returned by :meth:`DensePureState.view`. For larger systems, this method will fail due to :class:`cupy.ndarray` not supporting more than 32 dimensions.
:attr:`DensePureState.local_info` returns the shape of the local slice as well as the offsets along each dimension.
This information is required to interpret the state's coefficients or modify them to, for example, specify an initial state.
Note that the batch size is always reported as the last mode, even if the batch size is 1.

For scripts purely intended for single-GPU execution, both :attr:`DensePureState.storage_size` and :attr:`DensePureState.local_info` directly follow from the Hilbert space dimensions and batch size, and the user may omit these steps.
Note that by passing a one-dimensional array of size :attr:`DensePureState.storage_size`, user scripts will be directly portable between single-GPU and multi-GPU execution.

For states with the same Hilbert space dimensions and batch size, a convenient way of creating a new instance and attaching storage directly is provided via :meth:`DensePureState.clone`.

.. code-block:: python

    psi_2 = psi_1.clone(cp.zeros_like(psi_1.storage))

Besides direct manipulation of the storage buffer by the user, there are several methods which allow element-wise operations, :meth:`DensePureState.inplace_scale` and :meth:`DensePureState.inplace_accumulate`.
Unary and binary reductions on :class:`DensePureState` and :class:`DenseMixedState` instances are provided via :meth:`DensePureState.norm`, :meth:`DensePureState.inner_product` and :meth:`DensePureState.trace`.

Quantum many-body operators (super-operators)
=============================================
A broad class of quantum Liouvillian superoperators, or quantum
many-body operators in general, can be expressed as a sum
of tensor products of elementary tensor operators with some,
generally time-dependent coefficients, where an *elementary tensor operator*
is a tensor operator acting on a specific number of quantum degrees of freedom,
either from the right or from the left, or both. Each elementary tensor operator
has the same number of the ket and bra modes, with the ket modes preceding
the bra modes in its tensor. The bra modes of the elementary tensor operator
act on the ket modes of the quantum state (action from the left) while the ket
modes of the elementary tensor operator act on the bra modes of the quantum
state (action from the right). In practice, many quantum Liouvillian
operators are expressed in terms of tensor products of the one-body
elementary tensor operators, like creation/annihilation operators,
particle number operator, coordinate/momentum operators,
individual spin operators, etc. For the sake of concreteness,
let's consider the following quantum Liouvillian operator
as an example:

.. math::

    L = \omega_{c} b^{\dagger} b 
        + \frac{1}{2} \omega_{a} \sigma_{z}
        + [ \frac{1}{2} g(t, \alpha, \beta) b^{\dagger} \sigma_{-}
        +   \frac{1}{2} g(t, \alpha, \beta) b \sigma_{+} ]
        + \gamma \sigma_{-}

In the above Liouvillian operator, :math:`b`, :math:`b^{\dagger}`, :math:`\sigma_{z}`,
:math:`\sigma_{-}`, :math:`\sigma_{+}` are elementary tensor operators,
and each of them acts only on a single quantum degree of freedom,
represented by an order-2 tensor :math:`U[i0; j0]`, where :math:`i0` is the ket
mode and :math:`j0` is the bra mode (an order-`k` elementary tensor operator
is repsented by a tensor with :math:`k` ket modes and :math:`k` bra modes).
:math:`\omega_{c}`, :math:`\omega_{a}`, and :math:`\gamma` are constant scalars.
:math:`g(t, \alpha, \beta)` is a time-dependent scalar coefficient that
depends on time and two real parameters, :math:`\alpha` and :math:`\beta`,
which parameterize the Liouvillian operator.

Elementary operators
--------------------
Tensors that define elementary operators can be provided directly as ``cupy.ndarray`` (on the :class:`WorkStream`\ 's device) or ``numpy.ndarray``. 

The pythonic API also provides :class:`DenseOperator` and :class:`MultidiagonalOperator` objects, which wrap around the underlying data and are reusable across multiple expressions involving the elementary operator.
:class:`MultiDiagonalOperator`\ s are operators acting on a single mode, specified in sparse DIA format.
:class:`DenseOperator`\ s have underlying dense storage.

Batching is supported for elementary operators. The batch dimension is inferred from the last dimensions of the arrays passed if the array has an odd number of dimensions.

Both :class:`DenseOperator` and :class:`MultidiagonalOperator` can be defined either as static operators, with the underlying data required to be unchanged during execution, or as dynamic operators, with callback functions updating the underlying storage.
In both cases, direct modification of the underlying data by the user is not allowed and will lead to undefined behaviour.
Callback functions are passed as callables wrapped as either :class:`CPUCallback` or :class:`GPUCallback` instances, where the difference between the two is whether callback arguments are expected as GPU or CPU arrays and whether the functions returns or modifies a GPU or CPU array.
Callbacks can either return an array with a function signature ``(t: float, args: cupy.ndarray | np.ndarray) -> cupy.ndarray | numpy.ndarray`` or  modify the underlying storage in-place with a function signature ``(t: float, args: cupy.ndarray | np.ndarray, arr: cupy.ndarray | numpy.ndarray) -> None``.
This behaviour is specified by optional keyword argument ``is_inplace`` (default is ``False``) when wrapping a callable in a :class:`GPUCallback` or :class:`CPUCallback`.

:class:`DenseOperator` / :class:`MultidiagonalOperator` make a copy of the array provided when creating the instance. This behaviour can be overridden with the keyword argument ``copy=True``, in which case the array is required to be a fortran-contiguous ``cupy.ndarray``.
If the tensor data is originally provided as a ``numpy.ndarray``, it will be moved to GPU (the :class:`WorkStream`\ 's device) the first time the elementary operator is being used as part of an :class:`Operator` or :class:`OperatorAction`.

If an elementary operator appears multiple times in the expression for the quantum many-body operator, using the :class:`DenseOperator` and :class:`MultidiagonalOperator` allows us to reuse the same instance and is thus preferred.

Certain unary and binary operations are supported for :class:`DenseOperator` / :class:`MultidiagonalOperator` for the users' convenience allowing to define new instances of the latter as arithmetic operations of existing instances.
Note that this feature is only available for elementary operators specified without callback or with an out-of-place callback returning on the CPU.

Matrix operators
----------------
In some cases, many-body operators cannot be expressed as a tensor product of elementary operators over a subset of modes.
In such situations, the user can specify them as a :class:`MatrixOperator` which is interpreted as an operator acting on all degrees of freedom in the Hilbert space.
Currently, only dense underlying storage is supported through :class:`LocalDenseMatrixOperator`, which can be specified similary to :class:`DenseElementaryOperator` with two restrictions.
First, the provided array is required to be a Fortran-contiguous ``cupy.ndarray`` with tensor shape corresponding to a concatenation of the hilbert space dimensions (once for ket modes, once for bra modes), and optionally a batch dimension.
Second, if the user provides a callback function for dynamically updating the tensor entries, only :class:`GPUCallback` s which are acting *in-place* are supported in order to avoid unnecessary copies.

Composing symbolic expressions with operators
---------------------------------------------
Individual elementary operators or matrix operators can be composed into products and sums thereof.
The API provides several layers for such products and sums which are composed via aggregation.

Products of elementary operators are most conveniently composed by the :func:`tensor_product` free function.
The arguments to tensor products are tuples containing the elementary operator, a sequence of modes it's supported on, and optionally a sequence of booleans ("dualities") which denotes whether the operator contracts with the ket or bra modes of the state (for mixed states, defaulting to ``False``, i.e. ket modes). Furthermore, a coefficient can be supplied via a named argument.
If the coefficients do not change as a function of time or callback arguments, the coefficient could either be a scalar or a ``numpy.ndarray`` or ``cupy.ndarray`` in the case of batched coefficients. Dynamic coefficients are specified as instances of :class:`GPUCallback` or :class:`CPUCallback` and are expected to modify or return a ``numpy.ndarray`` or ``cupy.ndarray``.

Products of matrix operators are expressed through the :func:`full_matrix_product` free function. 
The arguments to matrix products are tuples containing the matrix operator, and two optional boolean arguments.
The first optional boolean argument is the matrix operators' "duality", which indicate whether the matrix is left-multiplied onto the quantum state's ket modes (default, ``False``) or right multiplied onto the quantum state's bra modes (``True``).
The second boolean argument triggers application of the matrices' complex conjugate if set to ``True``.

Both  :func:`tensor_product` and  :func:`full_matrix_product` return an :class:`OperatorTerm` object, which represents a sum of products of elementary tensor operators and products of matrix operators.
Note that while a single product cannot contain a mix between elementary operators and matrix operators, an :class:`OperatorTerm` can contain both types of products.
:class:`OperatorTerm`\ supports in-place (``+=``, :meth:`OperatorTerm.append_elementary_product)` and :meth:`OperatorTerm.append_elementary_product)` methods) and out-place addition (``+``) to compose the sum, as
well as multiplication by a :class:`numbers.Number` or :class:`CPUCallback`/:class:`GPUCallback`. Inplace addition is only supported before the :class:`OperatorTerm` starts using a :class:`WorkStream` (see below).
Additionally, multiplication of :class:`OperatorTerm`\ s is supported, creating a new instance which contains the product of sums of products of the factors, with the number of terms in the sum being equal to the product of the number of terms in the original :class:`OperatorTerm`\ s.
We would like to note that the computation time scales roughly linearly in the number of products in the :class:`OperatorTerm`, and thus advise against using multiplication of :class:`OperatorTerm`\ s if a simplification can be performed manually instead.

An :class:`Operator` contains a sum of :class:`OperatorTerm`\ s, and each :class:`OperatorTerm` is associated with a coefficient (defaults to ``1``) and a boolean *duality* (defaults to ``False``).
Coefficients are specified in the same format as described above for coefficients associated with a single product, and can be batched or non-batched.
If *duality* is ``True``, the action on bra and ket modes as specified for each product in the :class:`OperatorTerm` is flipped.
:meth:`Operator.dual` returns a new instance with these boolean arguments flipped, which is useful to express e.g. commutators.
:class:`Operator`\ s support out-of-place addition and substraction with other :class:`Operator` instances, as well as in-place (``+=``, :meth:`Operator.append` method) and out-of-place (``+``) addition of :class:`OperatorTerm`.
In-place addition is only supported before the :class:`Operator`\ starts using a :class:`WorkStream` (see below).
Furthermore, multiplication by a scalar, ``numpy.ndarray``, ``cupy.ndarray`` or :class:`GPUCallback` / :class:`CPUCallback` is allowed, which will be applied to the coefficient of each :class:`OperatorTerm` in the :class:`Operator`.

The action of an :class:`Operator` on a quantum state ``state_in`` accumulated into a quantum state ``state_out`` can be computed via :meth:`Operator.compute_action`. 
An expectation value of an operator evalyated on a quantum state can be computed via :meth:`Operator.compute_expectation`.
Before performing these calls, :meth:`Operator.prepare_action` (resp. :meth:`Operator.prepare_expectation`) should be invoked once, which remains valid for the lifetime of the object.

While the concept of a Liouvillian superoperator and its action
on a quantum state is sufficient to specify the right-hand side
of the broadly used master equations, like the Lindblad equation,
in a more general setting one may need to be able to specify
a *system of coupled ordinary differential equations* (ODE)
to simultaneously evolve multiple (potentially batched) quantum
states under their own Liouvillian operators, as
in the hierarchical equations of motion (HEOM) method.

Such an ODE can be expressed by :class:`OperatorAction`. An :class:`OperatorAction` contains a sequence of :class:`Operator`, which can be applied to a sequence of quantum states via :meth:`OperatorAction.compute`, where the result is accumulated into a single output state.

Note that issuing a :meth:`Operator.prepare_action` / :meth:`Operator.prepare_expectation` call on :class:`Operator` or creating an :class:`OperatorAction` can have side effects
on all objects that they depend on (directly and indirectly). In particular the object and its dependencies
will be using the :class:`WorkStream` instance passed in, which is not reversible as explained above.
If the dependent objects are already using a different :class:`WorkStream` instance, an error will be raised. 
Generally we advise against using more than a single :class:`WorkStream` instance per GPU.

Batching
========
In a batched simulation, there are only two allowed batch sizes for operators and coefficients: the batch size of the quantum state and ``1``. If a coefficient or operator has batch size ``1``, it will be broadcasted along the batch dimension.
Furthermore, input and output quantum states are required to have the same batch dimension.

Usage examples
==============

General usage example
---------------------
The following example illustrates various ways to define elementary operators and compose them into :class:`OperatorTerm`, :class:`Operator`, :class:`OperatorAction`.
It is written for single-GPU execution, although enabling multi-GPU execution is straightforward as shown in the next example. 

.. literalinclude:: ../../../python/samples/densitymat/operator_advanced_example.py
   :language: python

MGMN usage example
------------------
The following is a minimal working example for multi-GPU multi-node (MGMN) execution.
It includes a utility function to set the quantum state to a specific product state in the computational basis, 
which illustrates how to interpret :attr:`DensePureState.local_info` and work with the one-dimensional quantum state storage buffer required for MGMN execution.

.. literalinclude:: ../../../python/samples/densitymat/operator_mpi_example.py
   :language: python

Dense elementary operator example
---------------------------------
The following example goes into more detail of specifying and using :class:`DenseOperator`.

.. literalinclude:: ../../../python/samples/densitymat/dense_operator_example.py
   :language: python

Multidiagonal elementary operator example
-----------------------------------------
The following example goes into more detail of specifying and using :class:`MultidiagonalOperator`.

.. literalinclude:: ../../../python/samples/densitymat/multidiagonal_operator_example.py
   :language: python

Full matrix operator example
----------------------------
The following example goes into more detail of specifying and using :class:`LocalDenseMatrixOperator`.

.. literalinclude:: ../../../python/samples/densitymat/matrix_operator_example.py
   :language: python

Lindbladian example
-------------------
The following example shows how to use methods of :class:`OperatorTerm` and :class:`Operator` to define a function that assembles a dissipative Lindbladian superoperator from a Hamiltonian and a collection of jump operators.

.. literalinclude:: ../../../python/samples/densitymat/lindbladian_example.py
   :language: python


API reference
=============

Context API
-----------

.. autosummary::
   :toctree: generated/

   :template: dataclass.rst

   WorkStream


Callback API
------------

.. autosummary::
   :toctree: generated/
   
   CPUCallback
   GPUCallback


Operator API
------------

.. autosummary::
   :toctree: generated/

   tensor_product
   full_matrix_product
   DenseOperator
   MultidiagonalOperator
   LocalDenseMatrixOperator
   OperatorTerm
   Operator
   OperatorAction


State API
---------

.. autosummary::
   :toctree: generated/

   DensePureState
   DenseMixedState