Quantum Dynamics APIs

The pythonic API in cuquantum.densitymat module closely follows the structure of the cuDensityMat C APIs, providing separate objects for different concepts such as quantum states and operators.

WorkStream

At the base of the pythonic API is the WorkStream object which aggregates some configuration settings, cuDensityMat library handle, CUDA stream, CUDA device and workspace buffer into a single object. The workspace buffer is allocated automatically as required by the other API objects interacting with the WorkStream instance. An upper limit to the allocated workspace buffer size can be specified at the creation of the WorkStream instance. The workspace buffer is not released until an explicit call to the WorkStream.release_workspace() method is made. In general usage, only a single WorkStream per device should be created. Other API objects can only interact with one WorkStream for their lifetime. Currently only blocking operation is supported, i.e. API calls return only after completion of the task on the device.

from cuquantum.densitymat import WorkStream
ctx = WorkStream(device_id=0)

Multi-GPU multi-node execution is supported by passing a communicator to the WorkStream via its WorkStream.set_communicator() method. Note that this step has to be performed before the WorkStream is passed to any other API object. Please follow the instructions in cuDensityMat MPI setup to set environment variables for MGMN execution. Apart from setting the communicator, the only place where multi-GPU execution has any consequence is for specifying quantum states, the coefficients of which will be distributed over all GPUs.

from mpi4py import MPI
import cupy as cp
from cuquantum.densitymat import WorkStream

comm = MPI.COMM_WORLD.Dup()
rank = comm.Get_rank()
num_devices = cp.cuda.runtime.getDeviceCount()
dev_id = rank % num_devices
cp.cuda.Device(dev_id).use()
ctx = WorkStream(device_id=dev_id)
ctx.set_communicator(comm, provider="MPI")

Tensor spaces and quantum states

A tensor space is a composite linear space constructed as a tensor product of one or more vector spaces. Each constituting vector space represents a quantum degree of freedom of dimension equal to the dimension of the vector space. The shape of the composite tensor space is a tuple of the dimensions of the constituting vector spaces in their original order. Note that the shape of the tensor space stays the same irrespective of whether we deal with pure states (vectors living in the tensor space) or mixed states (matrices associated with the same tensor space).

A pure quantum state is represented by a tensor from the specified tensor space, thus the shape of a pure quantum state coincides with the tensor space shape. A mixed quantum state is represented by a tensor belonging to the tensor space constructed as a tensor product of the specified tensor space (primary space or ket space) and its dual tensor space (dual space or bra space). Thus, the shape of a mixed quantum state is a concatenation of two tensor space shapes, where the first half of the tensor modes corresponds to the ket space while the second half of the tensor modes corresponds to the bra space. For example, if tensor \(V[i0, i1, i2]\) represents a pure quantum state from a tensor space constructed from three vector spaces (quantum degrees of freedom), then tensor \(D[i0, i1, i2; j0, j1, j2]\) represents a mixed quantum state from the same tensor space, where modes \(i0, i1, i2\) are the ket modes while modes \(j0, j1, j2\) are the bra modes. The corresponding tensor space shape in both cases is \([range(i0), range(i1), range(i2)]\).

In the pythonic API, a pure (mixed) quantum state is represented by an instance of DensePureState (DenseMixedState). Below, we use the dense pure state DensePureState class as an example to provide detailed instructions. The workflow for dense mixed states is identical, and the two classes have identical methods and attributes.

In order to create a quantum state, we pass a WorkStream, the Hilbert space dimensions, batch size and datatype:

import cupy as cp
from cuquantum.densitymat import DensePureState

hilbert_space_dims = (2, 6, 4, 3)
batch_size = 1
dtype = "complex64"
psi_1 = DensePureState(ctx, hilbert_space_dims, batch_size, dtype)

The created quantum state has no associated physical storage yet.

In order to attach physical storage to a quantum state, there are two options:

psi_1.allocate_storage()

DensePureState.allocate_storage() will allocate a cupy.ndarray of the correct datatype with psi_1.storage_size elements, initialized to zero.

size = psi_1.storage_size
psi_1.attach_storage(cp.zeros(size, dtype = psi_1.dtype))

DensePureState.attach_storage() allows the user to pass a cupy.ndarray with DensePureState.storage_size elements located on the same device as the one passed to WorkStream. For single-GPU calculations, the storage size corresponds to the number of coefficients of the state, and the array passed can be multidimensional. In that case, the array needs to be Fortran-ordered, contiguous, and matching the Hilbert space dimensions and batch size.

For a multi-GPU setup, each process only stores a subset of state coefficients, which we will refer to as a slice. The size of the slice may be equal or smaller than the storage buffer size. Only one-dimensional storage buffers are allowed in multi-GPU execution. The slice is always located in the first DensePureState.storage_size elements of the storage buffer.

DensePureState.storage returns the attached storage buffer. For states with a total of fewer than 32 modes, a multidimensional, writable view on the slice is returned by DensePureState.view(). For larger systems, this method will fail due to cupy.ndarray not supporting more than 32 dimensions. DensePureState.local_info returns the shape of the local slice as well as the offsets along each dimension. This information is required to interpret the state’s coefficients or modify them to, for example, specify an initial state. Note that the batch size is always reported as the last mode, even if the batch size is 1.

For scripts purely intended for single-GPU execution, both DensePureState.storage_size and DensePureState.local_info directly follow from the Hilbert space dimensions and batch size, and the user may omit these steps. Note that by passing a one-dimensional array of size DensePureState.storage_size, user scripts will be directly portable between single-GPU and multi-GPU execution.

For states with the same Hilbert space dimensions and batch size, a convenient way of creating a new instance and attaching storage directly is provided via DensePureState.clone().

psi_2 = psi_1.clone(cp.zeros_like(psi_1.storage))

Besides direct manipulation of the storage buffer by the user, there are several methods which allow element-wise operations, DensePureState.inplace_scale() and DensePureState.inplace_accumulate(). Unary and binary reductions on DensePureState and DenseMixedState instances are provided via DensePureState.norm(), DensePureState.inner_product() and DensePureState.trace().

Quantum many-body operators (super-operators)

A broad class of quantum Liouvillian superoperators, or quantum many-body operators in general, can be expressed as a sum of tensor products of elementary tensor operators with some, generally time-dependent coefficients, where an elementary tensor operator is a tensor operator acting on a specific number of quantum degrees of freedom, either from the right or from the left, or both. Each elementary tensor operator has the same number of the ket and bra modes, with the ket modes preceding the bra modes in its tensor. The bra modes of the elementary tensor operator act on the ket modes of the quantum state (action from the left) while the ket modes of the elementary tensor operator act on the bra modes of the quantum state (action from the right). In practice, many quantum Liouvillian operators are expressed in terms of tensor products of the one-body elementary tensor operators, like creation/annihilation operators, particle number operator, coordinate/momentum operators, individual spin operators, etc. For the sake of concreteness, let’s consider the following quantum Liouvillian operator as an example:

\[L = \omega_{c} b^{\dagger} b + \frac{1}{2} \omega_{a} \sigma_{z} + [ \frac{1}{2} g(t, \alpha, \beta) b^{\dagger} \sigma_{-} + \frac{1}{2} g(t, \alpha, \beta) b \sigma_{+} ] + \gamma \sigma_{-}\]

In the above Liouvillian operator, \(b\), \(b^{\dagger}\), \(\sigma_{z}\), \(\sigma_{-}\), \(\sigma_{+}\) are elementary tensor operators, and each of them acts only on a single quantum degree of freedom, represented by an order-2 tensor \(U[i0; j0]\), where \(i0\) is the ket mode and \(j0\) is the bra mode (an order-k elementary tensor operator is repsented by a tensor with \(k\) ket modes and \(k\) bra modes). \(\omega_{c}\), \(\omega_{a}\), and \(\gamma\) are constant scalars. \(g(t, \alpha, \beta)\) is a time-dependent scalar coefficient that depends on time and two real parameters, \(\alpha\) and \(\beta\), which parameterize the Liouvillian operator.

Elementary operators

Tensors that define elementary operators can be provided directly as cupy.ndarray (on the WorkStream‘s device) or numpy.ndarray.

The pythonic API also provides DenseOperator and MultidiagonalOperator objects, which wrap around the underlying data and are reusable across multiple expressions involving the elementary operator. MultiDiagonalOperators are operators acting on a single mode, specified in sparse DIA format. DenseOperators have underlying dense storage.

Batching is supported for elementary operators. The batch dimension is inferred from the last dimensions of the arrays passed if the array has an odd number of dimensions.

Both DenseOperator and MultidiagonalOperator can be defined either as static operators, with the underlying data required to be unchanged during execution, or as dynamic operators, with callback functions updating the underlying storage. In both cases, direct modification of the underlying data by the user is not allowed and will lead to undefined behaviour. Callback functions are passed as callables wrapped as either CPUCallback or GPUCallback instances, where the difference between the two is whether callback arguments are expected as GPU or CPU arrays and whether the functions returns or modifies a GPU or CPU array. Callbacks can either return an array with a function signature (t: float, args: cupy.ndarray | np.ndarray) -> cupy.ndarray | numpy.ndarray or modify the underlying storage in-place with a function signature (t: float, args: cupy.ndarray | np.ndarray, arr: cupy.ndarray | numpy.ndarray) -> None. This behaviour is specified by optional keyword argument is_inplace (default is False) when wrapping a callable in a GPUCallback or CPUCallback.

DenseOperator / MultidiagonalOperator make a copy of the array provided when creating the instance. This behaviour can be overridden with the keyword argument copy=True, in which case the array is required to be a fortran-contiguous cupy.ndarray. If the tensor data is originally provided as a numpy.ndarray, it will be moved to GPU (the WorkStream‘s device) the first time the elementary operator is being used as part of an Operator or OperatorAction.

If an elementary operator appears multiple times in the expression for the quantum many-body operator, using the DenseOperator and MultidiagonalOperator allows us to reuse the same instance and is thus preferred.

Certain unary and binary operations are supported for DenseOperator / MultidiagonalOperator for the users’ convenience allowing to define new instances of the latter as arithmetic operations of existing instances. Note that this feature is only available for elementary operators specified without callback or with an out-of-place callback returning on the CPU.

Matrix operators

In some cases, many-body operators cannot be expressed as a tensor product of elementary operators over a subset of modes. In such situations, the user can specify them as a MatrixOperator which is interpreted as an operator acting on all degrees of freedom in the Hilbert space. Currently, only dense underlying storage is supported through LocalDenseMatrixOperator, which can be specified similary to DenseElementaryOperator with two restrictions. First, the provided array is required to be a Fortran-contiguous cupy.ndarray with tensor shape corresponding to a concatenation of the hilbert space dimensions (once for ket modes, once for bra modes), and optionally a batch dimension. Second, if the user provides a callback function for dynamically updating the tensor entries, only GPUCallback s which are acting in-place are supported in order to avoid unnecessary copies.

Composing symbolic expressions with operators

Individual elementary operators or matrix operators can be composed into products and sums thereof. The API provides several layers for such products and sums which are composed via aggregation.

Products of elementary operators are most conveniently composed by the tensor_product() free function. The arguments to tensor products are tuples containing the elementary operator, a sequence of modes it’s supported on, and optionally a sequence of booleans (“dualities”) which denotes whether the operator contracts with the ket or bra modes of the state (for mixed states, defaulting to False, i.e. ket modes). Furthermore, a coefficient can be supplied via a named argument. If the coefficients do not change as a function of time or callback arguments, the coefficient could either be a scalar or a numpy.ndarray or cupy.ndarray in the case of batched coefficients. Dynamic coefficients are specified as instances of GPUCallback or CPUCallback and are expected to modify or return a numpy.ndarray or cupy.ndarray.

Products of matrix operators are expressed through the full_matrix_product() free function. The arguments to matrix products are tuples containing the matrix operator, and two optional boolean arguments. The first optional boolean argument is the matrix operators’ “duality”, which indicate whether the matrix is left-multiplied onto the quantum state’s ket modes (default, False) or right multiplied onto the quantum state’s bra modes (True). The second boolean argument triggers application of the matrices’ complex conjugate if set to True.

Both tensor_product() and full_matrix_product() return an OperatorTerm object, which represents a sum of products of elementary tensor operators and products of matrix operators. Note that while a single product cannot contain a mix between elementary operators and matrix operators, an OperatorTerm can contain both types of products. OperatorTermsupports in-place (+=, OperatorTerm.append_elementary_product)() and OperatorTerm.append_elementary_product)() methods) and out-place addition (+) to compose the sum, as well as multiplication by a numbers.Number or CPUCallback/GPUCallback. Inplace addition is only supported before the OperatorTerm starts using a WorkStream (see below). Additionally, multiplication of OperatorTerms is supported, creating a new instance which contains the product of sums of products of the factors, with the number of terms in the sum being equal to the product of the number of terms in the original OperatorTerms. We would like to note that the computation time scales roughly linearly in the number of products in the OperatorTerm, and thus advise against using multiplication of OperatorTerms if a simplification can be performed manually instead.

An Operator contains a sum of OperatorTerms, and each OperatorTerm is associated with a coefficient (defaults to 1) and a boolean duality (defaults to False). Coefficients are specified in the same format as described above for coefficients associated with a single product, and can be batched or non-batched. If duality is True, the action on bra and ket modes as specified for each product in the OperatorTerm is flipped. Operator.dual() returns a new instance with these boolean arguments flipped, which is useful to express e.g. commutators. Operators support out-of-place addition and substraction with other Operator instances, as well as in-place (+=, Operator.append() method) and out-of-place (+) addition of OperatorTerm. In-place addition is only supported before the Operatorstarts using a WorkStream (see below). Furthermore, multiplication by a scalar, numpy.ndarray, cupy.ndarray or GPUCallback / CPUCallback is allowed, which will be applied to the coefficient of each OperatorTerm in the Operator.

The action of an Operator on a quantum state state_in accumulated into a quantum state state_out can be computed via Operator.compute_action(). An expectation value of an operator evalyated on a quantum state can be computed via Operator.compute_expectation(). Before performing these calls, Operator.prepare_action() (resp. Operator.prepare_expectation()) should be invoked once, which remains valid for the lifetime of the object.

While the concept of a Liouvillian superoperator and its action on a quantum state is sufficient to specify the right-hand side of the broadly used master equations, like the Lindblad equation, in a more general setting one may need to be able to specify a system of coupled ordinary differential equations (ODE) to simultaneously evolve multiple (potentially batched) quantum states under their own Liouvillian operators, as in the hierarchical equations of motion (HEOM) method.

Such an ODE can be expressed by OperatorAction. An OperatorAction contains a sequence of Operator, which can be applied to a sequence of quantum states via OperatorAction.compute(), where the result is accumulated into a single output state.

Note that issuing a Operator.prepare_action() / Operator.prepare_expectation() call on Operator or creating an OperatorAction can have side effects on all objects that they depend on (directly and indirectly). In particular the object and its dependencies will be using the WorkStream instance passed in, which is not reversible as explained above. If the dependent objects are already using a different WorkStream instance, an error will be raised. Generally we advise against using more than a single WorkStream instance per GPU.

Batching

In a batched simulation, there are only two allowed batch sizes for operators and coefficients: the batch size of the quantum state and 1. If a coefficient or operator has batch size 1, it will be broadcasted along the batch dimension. Furthermore, input and output quantum states are required to have the same batch dimension.

Usage examples

General usage example

The following example illustrates various ways to define elementary operators and compose them into OperatorTerm, Operator, OperatorAction. It is written for single-GPU execution, although enabling multi-GPU execution is straightforward as shown in the next example.

# Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    DenseMixedState,
    DenseOperator,
    MultidiagonalOperator,
    WorkStream,
    Operator,
    OperatorAction,
    CPUCallback,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])
print("========================")


# define the shape of the composite tensor product space
hilbert_space_dims = (4, 5, 2, 6, 3, 7)  # six quantum degrees of freedom

# define some elementary tensor operators
# a time-independent multidiagonal operator
n_diags = 3
offsets = [-2, 0, 2]
a = np.random.random((hilbert_space_dims[3], n_diags))  # one-body multidiagonal elementary tensor operator
A = MultidiagonalOperator(a, offsets)

b = np.random.random(  # two-body elementary tensor operator, arbitrary strides also supported
    (
        hilbert_space_dims[3],
        hilbert_space_dims[5],
    )
    * 2
)

# a time-independent dense tensor operator
# We can wrap the NDArray in a TensorOperator, which is useful if we want to use the same tensor multiple times.
B = DenseOperator(b)


# one-body elementary tensor callback operator
def c_op_callback(t: float, args: np.ndarray):
    """
    This callback receives callback arguments passed by the user to a compute call as a np.ndarray of shape (number of parameters, batch_size).
    In this example we don't use batched operators, so batch_size would be 1.
    """
    args = args[:, 0]
    omega = args[0]

    return np.sin(omega * np.pi * t) * np.arange(1, hilbert_space_dims[1] ** 2 + 1).reshape((hilbert_space_dims[1],) * 2)


c_representative_tensor = c_op_callback(
    0.0,
    np.array(
        [
            0.0,
        ]
    ).reshape(1, 1),
)
# the callback needs to be wrapped so we can keep track of whether it's a CPU or GPU callback, and whether it operators in-place or out-of-place (default).
# making an instance of TensorOperator is optional, we can also pass the tuple of (c_representative_tensor, CPUCallback(c_callback)) directly below in palce of `C`
C = DenseOperator(c_representative_tensor, CPUCallback(c_op_callback, is_inplace=False))

print("Defined elementary operators A, B, C.")


# define a scalar callback function (time-dependent coefficient)
# this will return a 1D ndarray of size 1, which is accepted for a non-batched scalar callback in addition to a scalar return value
def coeff_callback(t, args):  # args is a 2D array of size num_params x batchsize
    omega = args[0, 0]
    return np.cos(np.pi * omega * t)  # return the scalar parameterized coefficient at time t


# construct tensor products of elementary tensor operators
ab = tensor_product(
    (
        A,  # elementary tensor operator
        (2,),  # quantum degrees of freedom it acts on
    ),
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    coeff=1.0,  # constant (static) coefficient
)

bc = tensor_product(
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
    ),
    coeff=CPUCallback(
        coeff_callback
    ),  # time-dependent parameterized coefficient represented by a user-defined callback function
)

# construct different operator terms
term1 = ab + bc  # an OperatorTerm composed of a sum of two tensor operator products
term1 += bc  # `OperatorTerm` also supports in-place addition

term2 = tensor_product(  # an operator term composed of a single elementary tensor operator
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
        (False,),  # operator action duality (side: left/right) for each quantum degree of freedom
    ),
    coeff=1.0,  # constant (static) coefficient
)

print("Created OperatorTerms term1 and term2.")

# construct the Hamiltonian operator from two operator terms
hamiltonian = Operator(
    hilbert_space_dims,  # shape of the composite tensor space
    (term1,),  # first operator term with a default coefficient 1.0
    (
        term2,
        CPUCallback(coeff_callback),
    ),  # second operator term modulated by a parameterized time-dependent coefficient (callback function)
)

print("Created Hamiltonian Operator from term1 and term2.")

# construct the Liouvillian for the von Neumann equation
liouvillian = hamiltonian - hamiltonian.dual()  # Hamiltonian action on the left minus Hamiltonian action on the right: [H, *]

print("Created Liouvillian Operator from Hamiltonian.")

# open a work stream over a CUDA stream
my_stream = cp.cuda.Stream()
ctx = WorkStream(stream=my_stream)

# construct the Liouvillian action for a single quantum state
liouvillian_action = OperatorAction(ctx, (liouvillian,))

print("Created Liouvillian OperatorAction from Liouvillian.")

# create a mixed quantum state (density matrix) with zero initialized data buffer
batch_size = 1
rho0 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
slice_shape, slice_offsets = rho0.local_info
rho0.attach_storage(cp.zeros(rho0.storage_size, dtype=rho0.dtype))
# set storage to a Haar random unnormalized state
# for MGMN execution, the data buffer may be larger than the locally stored slice of the state
# the view method returns a tensor shaped view on the local slice (the full state for single-GPU execution)
rho0.view()[:] = cp.random.normal(size=slice_shape) + (1j * cp.random.normal(size=slice_shape))
# for non-random initialization and MGMN execution, we would use slice_offsets to determine how to set the elements
norm = rho0.norm().get()[()]
rho0.inplace_scale(np.sqrt(1 / norm))
assert np.isclose(rho0.norm().get()[()], 1)

print("Created a Haar random normalized mixed quantum state (not physical due to lack of hermitianity).")

# two ways of creating another mixed quantum state of the same shape and init it to zero
rho1 = rho0.clone(cp.zeros_like(rho0.storage))
rho2 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
rho2.allocate_storage()

print("Created a zero-initialized output mixed quantum state.")


# prepare operator action on a mixed quantum state
liouvillian_action.prepare(ctx, (rho0,))

print("Prepared Liouvillian action through OperatorAction.prepare.")

# set parameters for the callback function to some value
callback_args = np.array([2.4]).reshape(1, 1)

# compute the operator action on a given quantum state
liouvillian_action.compute(
    0.5,  # time value
    callback_args,  # user-defined parameters
    (rho0,),  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through OperatorAction.compute.")

# alternatively, prepare the operator action directly via the operator
liouvillian.prepare_action(ctx, rho0)

print("Prepared Liouvillian action through Operator.prepare_action.")

# compute the operator action directly via the operator
liouvillian.compute_action(
    0.0,  # time value
    callback_args,  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)
# OperatorAction.compute and Operator.compute_action are accumulative, so rho0p should now be equivalent to twice the action of the Liouvillian.

print("Computed Liouvillian action through Operator.compute_action.")

# prepare the operator expectation value computation
liouvillian.prepare_expectation(ctx, rho0)

print("Prepared expectation through Operator.prepare_expectation.")

# compute the operator expectation value
expval = liouvillian.compute_expectation(
    0.5,  # time value
    callback_args,  # user-defined parameters
    rho1,  # input quantum state
)

print("Computed expectation through Operator.compute_expectation.")

# we can compute the operator action again without another prepare call
liouvillian.compute_action(
    0.5,  # time value
    callback_args,  # user-defined parameters
    rho0,  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action.")

# assuming we want to some other task in between it may be useful to release the workspace
ctx.release_workspace()

# releasing the workspace has no user-facing side effects
# for example, we do not need to reissue prepare calls
liouvillian.compute_action(
    0.5,  # time value
    callback_args,  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action after releasing workspace.")

# synchronize the work stream
my_stream.synchronize()

print("Finished computation and exit.")

MGMN usage example

The following is a minimal working example for multi-GPU multi-node (MGMN) execution. It includes a utility function to set the quantum state to a specific product state in the computational basis, which illustrates how to interpret DensePureState.local_info and work with the one-dimensional quantum state storage buffer required for MGMN execution.

# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from mpi4py import MPI

from cuquantum.densitymat import (
    tensor_product,
    DensePureState,
    DenseOperator,
    MultidiagonalOperator,
    WorkStream,
    OperatorTerm,
    Operator,
    OperatorAction,
)


def ordered_print(str):
    rank = MPI.COMM_WORLD.Get_rank()
    size = MPI.COMM_WORLD.Get_size()
    for i in range(size):
        if i == rank:
            print(f"Rank {i}: {str}")
        MPI.COMM_WORLD.Barrier()


NUM_DEVICES = cp.cuda.runtime.getDeviceCount()
rank = MPI.COMM_WORLD.Get_rank()
dev = cp.cuda.Device(rank % NUM_DEVICES)
dev.use()
props = cp.cuda.runtime.getDeviceProperties(dev.id)
ordered_print("===== device info ======")
ordered_print(f"GPU-local-id: {dev.id}")
ordered_print(f"GPU-name: {props['name'].decode()}")
ordered_print(f"GPU-clock: {props['clockRate']}")
ordered_print(f"GPU-memoryClock: {props['memoryClockRate']}")
ordered_print(f"GPU-nSM: {props['multiProcessorCount']}")
ordered_print(f"GPU-major: {props['major']}")
ordered_print(f"GPU-minor: {props['minor']}")
ordered_print("========================")


# create Workstream on the current device
ctx = WorkStream(device_id=dev.id)
ordered_print("Created WorkStream (execution context) on current device.")
# setup MPI communicator
ctx.set_communicator(comm=MPI.COMM_WORLD.Dup(), provider="MPI")
ordered_print("Passed MPI communicator to execution context, enabling distributed computation.")

# define the shape of the composite tensor product space
hilbert_space_dims = (4, 4, 4, 4, 4, 4)  # six quantum degrees of freedom
batch_size = 2

# define some elementary tensor operators
identity = DenseOperator(np.eye(hilbert_space_dims[0], dtype="complex128"))
identity_sparse = MultidiagonalOperator(
    np.ones((hilbert_space_dims[0], 1), dtype="complex128"),
    offsets=[
        0,
    ],
)
ordered_print("Defined dense and sparse identity elementary operator.")

op_term = OperatorTerm(dtype="complex128")
for i in range(len(hilbert_space_dims) - 1):
    op_term += tensor_product(
        (
            identity,
            (i,),
        ),
        (
            identity_sparse,
            (i + 1,),
        ),
    )
# This operator will just be proportional to the identity
op = Operator(hilbert_space_dims, (op_term,))
ordered_print(
    "Created Operator corresponding to the action of products of nearest neighbor identity operators for a one-dimensional lattice."
)

op_action = OperatorAction(ctx, (op,))
ordered_print("Created OperatorAction from previously defined Operator.")


def set_ditstring(state, batch_index, ditstring: list):
    """
    Set's the state's coefficient at for the `batch_index`'th quantum state to the product state in the computational basis encoded by `ditstring`.
    """
    slice_shape, slice_offsets = state.local_info
    ditstring = np.asarray(
        ditstring
        + [
            batch_index,
        ],
        dtype="int",
    )
    ditstring_is_local = True
    state_inds = []
    for slice_dim, slice_offset, state_dit in zip(slice_shape, slice_offsets, ditstring):
        ditstring_is_local = state_dit in range(slice_offset, slice_offset + slice_dim)
        if not ditstring_is_local:
            break
        else:
            state_inds.append(range(slice_offset, slice_offset + slice_dim).index(state_dit))
    if ditstring_is_local:
        strides = (1,) + tuple(np.cumprod(np.array(slice_shape)[:-1]))
        ind = np.sum(strides * np.array(state_inds))
        state.storage[ind] = 1.0


# product states to be set for each batch state
global_ditstrings = [[0, 1, 3, 2, 0, 2], [1, 0, 3, 2, 1, 0]]

# make initial state
state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
ordered_print("Created dense quantum state instance with distributed storage.")
required_buffer_size = state.storage_size
state.attach_storage(cp.zeros((required_buffer_size,), dtype="complex128"))
ordered_print("Attach local storage slice to dense quantum state instance.")

# set product states for each batch input state
for batch_ind in range(batch_size):
    set_ditstring(state, batch_ind, global_ditstrings[batch_ind])
ordered_print("Set distributed quantum state to desired product state.")

# more ways to make a State instance
state_out = state.clone(cp.zeros(required_buffer_size, dtype="complex128"))
another_state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
another_state.allocate_storage()
ordered_print("Created two zero initialized distributed quantum states to accumulate into.")

# prepare and compute action of Operator
op.prepare_action(ctx, state)
ordered_print("Prepared Operator for action on a distributed quantum state.")

op.compute_action(0.0, None, state, state_out)
ordered_print("Accumulated action of Operator on a distributed quantum state into a zero initialized output quantum state.")
# compute OperatorAction
op_action.compute(
    0.0,
    None,
    [
        state,
    ],
    another_state,
)
ordered_print("Accumulated the result of OperatorAction on a distributed quantum state into a zero initialized quantum state.")

assert cp.allclose(another_state.view(), state_out.view())
ordered_print("Verified that the result of OperatorAction and Operator on the same distributed input quantum state match.")

assert cp.allclose(state_out.view(), state.view() * (len(hilbert_space_dims) - 1))
ordered_print(
    "Verified that an Operator corresponding to sum of products of identity matrix scales the input quantum state by the number of terms in the sum."
)

Dense elementary operator example

The following example goes into more detail of specifying and using DenseOperator.

# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

"""
Dense operator example.
"""

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    full_matrix_product,
    DenseOperator,
    DenseMixedState,
    WorkStream,
    Operator,
    GPUCallback,
    CPUCallback,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])

np.random.seed(42)
cp.random.seed(42)

# Parameters
dtype = "complex128"
hilbert_space_dims = (4, 3, 5, 7)
num_modes = len(hilbert_space_dims)
dissipation_strengths = np.random.random((num_modes,))
modes_string = "abcdefghijkl"[: len(hilbert_space_dims)]
batch_size = 1


def take_complex_conjugate_transpose(arr):
    return arr.transpose(tuple(range(num_modes, 2 * num_modes)) + tuple(range(0, num_modes)) + (2 * num_modes,)).conj()


#############
# Hamiltonian
#############

h01_arr = cp.empty(
    (hilbert_space_dims[0], hilbert_space_dims[1], hilbert_space_dims[0], hilbert_space_dims[1], batch_size), dtype=dtype
)
h12_arr = cp.empty(
    (hilbert_space_dims[1], hilbert_space_dims[2], hilbert_space_dims[1], hilbert_space_dims[2], batch_size), dtype=dtype
)
h23_arr = cp.empty(
    (hilbert_space_dims[2], hilbert_space_dims[3], hilbert_space_dims[2], hilbert_space_dims[3], batch_size), dtype=dtype
)
h23_arr[:] = cp.random.normal(size=h23_arr.shape)


def callback1(t, args, arr):  # inplace callback
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    for i in range(hilbert_space_dims[0]):
        for j in range(hilbert_space_dims[1]):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, i, j, b] = (i + j) * np.sin(omega * t)


def callback2(t, args):  # out-of-place callback
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    arr = cp.empty(
        (hilbert_space_dims[1], hilbert_space_dims[2], hilbert_space_dims[1], hilbert_space_dims[2], batch_size), dtype=dtype
    )
    for i in range(hilbert_space_dims[1]):
        for j in range(hilbert_space_dims[2]):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, i, j, b] = (i + j) * np.cos(omega * t)
    return arr


h01_callback = GPUCallback(callback1, is_inplace=True)
h12_callback = GPUCallback(callback2, is_inplace=False)

h01_op = DenseOperator(h01_arr, h01_callback)
h12_op = DenseOperator(h12_arr, h12_callback)
h23_op = DenseOperator(h23_arr)

H = tensor_product((h01_op, [0, 1])) + tensor_product((h12_op, [1, 2])) + tensor_product((h23_op, [2, 3]))

print("Created an OperatorTerm for the Hamiltonian.")

#############
# Dissipators
#############

l0_arr = cp.empty((hilbert_space_dims[0], hilbert_space_dims[0], batch_size), dtype=dtype)
l1_arr = cp.empty((hilbert_space_dims[1], hilbert_space_dims[1], batch_size), dtype=dtype)
l2_arr = cp.empty((hilbert_space_dims[2], hilbert_space_dims[2], batch_size), dtype=dtype)
l3_arr = cp.empty((hilbert_space_dims[3], hilbert_space_dims[3], batch_size), dtype=dtype)

l0_arr[:] = cp.random.random(l0_arr.shape)
l1_arr[:] = cp.random.random(l1_arr.shape)
l2_arr[:] = cp.random.random(l2_arr.shape)


def callback3(t, args):
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    arr = np.empty((hilbert_space_dims[3], hilbert_space_dims[3], batch_size), dtype=dtype)
    for i in range(hilbert_space_dims[3]):
        for j in range(hilbert_space_dims[3]):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, b] = (i + j) * np.tan(omega * t)
    return arr


l3_callback = CPUCallback(callback3, is_inplace=False)

l0_op = DenseOperator(l0_arr)
l1_op = DenseOperator(l1_arr)
l2_op = DenseOperator(l2_arr)
l3_op = DenseOperator(l3_arr, l3_callback)

Ls = []
for i, l in enumerate([l0_op, l1_op, l2_op, l3_op]):
    l_dag_l = l.dag() @ l
    L = (
        tensor_product((l, [i], [False]), (l.dag(), [i], [True]))
        + tensor_product((-0.5 * l_dag_l, [i], [False]))
        + tensor_product((-0.5 * l_dag_l, [i], [True]))
    )
    Ls.append(dissipation_strengths[i] * L)

print("Created OperatorTerms for the Liouvillian.")

#############
# Liouvillian
#############

liouvillian = Operator(hilbert_space_dims, (H, -1j, False), (H, 1j, True), *[(L,) for L in Ls])

print("Created the Liouvillian operator.")

##########################
# Initial and final states
##########################

ctx = WorkStream()

rho = DenseMixedState(ctx, hilbert_space_dims, batch_size, dtype)
rho.attach_storage(cp.empty(rho.storage_size, dtype=dtype))
rho_arr = rho.view()
rho_arr[:] = cp.random.normal(size=rho_arr.shape)
if "complex" in dtype:
    rho_arr[:] += 1j * cp.random.normal(size=rho_arr.shape)
rho_arr += take_complex_conjugate_transpose(rho_arr)
rho_arr /= rho.trace()
print("Created a Haar random normalized mixed quantum state.")

rho_out = rho.clone(cp.zeros_like(rho.storage, order="F"))
print("Created zero initialized output state.")

#################
# Operator action
#################

liouvillian.prepare_action(ctx, rho)
liouvillian.compute_action(1.0, [3.5], rho, rho_out)

print("Finished computation and exit.")

Multidiagonal elementary operator example

The following example goes into more detail of specifying and using MultidiagonalOperator.

# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

"""
Multidiagonal operator example.
"""

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    full_matrix_product,
    MultidiagonalOperator,
    DenseMixedState,
    WorkStream,
    Operator,
    GPUCallback,
    CPUCallback,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])

np.random.seed(42)
cp.random.seed(42)

# Parameters
dtype = "complex128"
hilbert_space_dims = (4, 3, 5, 7)
h_offsets = [
    [-1, 0, 1],
    [0],
    [-1, 0, 1],
    [-1],
]
l_offsets = [[-1, 0, 1], [1], [-2, -1, 0, 1, 2], [1, 2]]
num_modes = len(hilbert_space_dims)
dissipation_strengths = np.random.random((num_modes,))
modes_string = "abcdefghijkl"[: len(hilbert_space_dims)]
batch_size = 1


def take_complex_conjugate_transpose(arr):
    return arr.transpose(tuple(range(num_modes, 2 * num_modes)) + tuple(range(0, num_modes)) + (2 * num_modes,)).conj()


#############
# Hamiltonian
#############

h0_arr = cp.empty((hilbert_space_dims[0], len(h_offsets[0]), batch_size), dtype=dtype)
h1_arr = cp.empty((hilbert_space_dims[1], len(h_offsets[1]), batch_size), dtype=dtype)
h2_arr = cp.empty((hilbert_space_dims[2], len(h_offsets[2]), batch_size), dtype=dtype)
h3_arr = cp.empty((hilbert_space_dims[3], len(h_offsets[3]), batch_size), dtype=dtype)


def callback1(t, args, arr):  # inplace callback
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    for i in range(hilbert_space_dims[0]):
        for j in range(len(h_offsets[0])):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, b] = (i + j) * np.sin(omega * t)


def callback2(t, args):  # out-of-place callback
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    arr = cp.empty((hilbert_space_dims[1], len(h_offsets[1]), batch_size), dtype=dtype)
    for i in range(hilbert_space_dims[1]):
        for j in range(len(h_offsets[1])):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, b] = (i + j) * np.cos(omega * t)
    return arr


h0_callback = GPUCallback(callback1, is_inplace=True)
h1_callback = GPUCallback(callback2, is_inplace=False)

h0_op = MultidiagonalOperator(h0_arr, h_offsets[0], h0_callback)
h1_op = MultidiagonalOperator(h1_arr, h_offsets[1], h1_callback)
h2_op = MultidiagonalOperator(h2_arr, h_offsets[2])
h3_op = MultidiagonalOperator(h3_arr, h_offsets[3])

H = (
    tensor_product((h0_op, [0]), (h1_op, [1]))
    + tensor_product((h1_op, [1]), (h2_op, [2]))
    + tensor_product((h2_op, [2]), (h3_op, [3]))
)

print("Created an OperatorTerm for the Hamiltonian.")

#############
# Dissipators
#############

l0_arr = cp.empty((hilbert_space_dims[0], len(l_offsets[0]), batch_size), dtype=dtype)
l1_arr = cp.empty((hilbert_space_dims[1], len(l_offsets[1]), batch_size), dtype=dtype)
l2_arr = cp.empty((hilbert_space_dims[2], len(l_offsets[2]), batch_size), dtype=dtype)
l3_arr = cp.empty((hilbert_space_dims[3], len(l_offsets[3]), batch_size), dtype=dtype)

l0_arr[:] = cp.random.random(l0_arr.shape)
l1_arr[:] = cp.random.random(l1_arr.shape)
l2_arr[:] = cp.random.random(l2_arr.shape)


def callback3(t, args):
    # The library reconstructs args array with batch size as the last dimension
    assert len(args) == batch_size  # length of args should be the same as batch size
    for i in range(batch_size):
        assert len(args[i]) == 1  # only one element per args list in this example

    arr = np.empty((hilbert_space_dims[3], len(l_offsets[3]), batch_size), dtype=dtype)
    for i in range(hilbert_space_dims[3]):
        for j in range(len(l_offsets[3])):
            for b in range(batch_size):
                omega = args[0][b]
                arr[i, j, b] = (i + j) * np.tan(omega * t)
    return arr


l3_callback = CPUCallback(callback3, is_inplace=False)

l0_op = MultidiagonalOperator(l0_arr, l_offsets[0])
l1_op = MultidiagonalOperator(l1_arr, l_offsets[1])
l2_op = MultidiagonalOperator(l2_arr, l_offsets[2])
l3_op = MultidiagonalOperator(l3_arr, l_offsets[3], callback=l3_callback)

Ls = []
for i, l in enumerate([l0_op, l1_op, l2_op, l3_op]):
    l_dag_l = l.dag() @ l
    L = (
        tensor_product((l, [i], [False]), (l.dag(), [i], [True]))
        + tensor_product((-0.5 * l_dag_l, [i], [False]))
        + tensor_product((-0.5 * l_dag_l, [i], [True]))
    )
    Ls.append(dissipation_strengths[i] * L)

print("Created OperatorTerms for the Liouvillian.")

#############
# Liouvillian
#############

liouvillian = Operator(hilbert_space_dims, (H, -1j, False), (H, 1j, True), *[(L,) for L in Ls])

print("Created the Liouvillian operator.")

##########################
# Initial and final states
##########################

ctx = WorkStream()

rho = DenseMixedState(ctx, hilbert_space_dims, batch_size, dtype)
rho.attach_storage(cp.empty(rho.storage_size, dtype=dtype))
rho_arr = rho.view()
rho_arr[:] = cp.random.normal(size=rho_arr.shape)
if "complex" in dtype:
    rho_arr[:] += 1j * cp.random.normal(size=rho_arr.shape)
rho_arr += take_complex_conjugate_transpose(rho_arr)
rho_arr /= rho.trace()
print("Created a Haar random normalized mixed quantum state.")

rho_out = rho.clone(cp.zeros_like(rho.storage, order="F"))
print("Created zero initialized output state.")

#################
# Operator action
#################

liouvillian.prepare_action(ctx, rho)
liouvillian.compute_action(1.0, [3.5], rho, rho_out)

print("Finished computation and exit.")

Full matrix operator example

The following example goes into more detail of specifying and using LocalDenseMatrixOperator.

# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    full_matrix_product,
    LocalDenseMatrixOperator,
    DenseMixedState,
    WorkStream,
    Operator,
    GPUCallback,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])

np.random.seed(42)
cp.random.seed(42)

# Parameters
dtype = "complex128"
hilbert_space_dims = (4, 3, 5, 7)
num_modes = len(hilbert_space_dims)
modes_string = "abcdefghijkl"[: len(hilbert_space_dims)]
dissipation_strength = 0.1
batch_size = 1


def take_complex_conjugate_transpose(arr):
    return arr.transpose(tuple(range(num_modes, 2 * num_modes)) + tuple(range(0, num_modes)) + (2 * num_modes,)).conj()


# create a WorkStream, holding workspace and stream (cp.cuda.get_current_stream() at the time of construction by default)
ctx = WorkStream()

#############
# Hamiltonian
#############

### create a time-dependent full matrix operator which updates only the diagonal entries of the matrix operator

## create the time-independent component which will also serve as GPU array containing the MatrixOperator data
# this needs to be an F-ordered GPU cp.ndarray, the shape of which reflects the local hilbert space dimensions and the batch size
static_matrix_op_arr = cp.empty((*hilbert_space_dims, *hilbert_space_dims, batch_size), dtype=dtype, order="F")
static_matrix_op_arr[:] = cp.random.normal(size=static_matrix_op_arr.shape)
if "complex" in dtype:
    static_matrix_op_arr[:] += 1j * (cp.random.normal(size=static_matrix_op_arr.shape))
# make it hermitian
static_matrix_op_arr += take_complex_conjugate_transpose(static_matrix_op_arr)

## create the time-dependent component as a callback function
# first prepare some data that will be used inside the callback
# create an array holding the diagonal elements of the kronecker sum of number operators for each site
number_operator_diag = cp.zeros(hilbert_space_dims, dtype=dtype, order="F")
identity_diag = cp.ones(hilbert_space_dims, dtype=dtype)
for i, dim in enumerate(hilbert_space_dims):
    number_operator_diag += cp.einsum("ijkl," + "ijkl"[i] + "->ijkl", identity_diag, cp.arange(dim))

# flatten and add dummy batch dimension
number_operator_diag = number_operator_diag.reshape((np.prod(hilbert_space_dims), 1))
# indices for indexing into diagonal elements of the
diag_inds = cp.diag_indices(np.prod(hilbert_space_dims))


# since this is a time-dependent matrix-operator, we need to define a callback function to update its elements
# `DenseLocalMatrixOperator` only supports callbacks which expect cp.ndarray as second (callback parameters) and third positional argument (the buffer containing the operators elements) and modify the third positional argument in place
def matrix_op_callback(t: float, args: cp.ndarray, arr: cp.ndarray) -> None:
    """
    This function updates the GPU array `arr` for a time-dependent matrix operator in place.
    In particular the diagonal is updated with an elementwise trigonometric function of the diagonal elements of the sum each sites number operator
    `args` is a cp.ndarray with shape (number of parameters, batch size)
    `arr` is a cp.ndarray with shape (*hilbert_space_dims, *hilbert_space_dims, batch size)
    """
    # when this callback function is invoked through the library it executes within a cupy.cuda.Stream context
    # since cp.ndarray operations are implicitly stream ordered, we don't need to use the stream explicitly here though

    stream = cp.cuda.get_current_stream()
    # the device on which `arr` is located will be the current cp.cuda.Device
    assert cp.cuda.Device() == arr.device

    # extract the coefficients, portable between batched and unbatched MatrixOperator
    omegas = args[0, :]

    # hilbert_space_dims available from enclosing scope, could also be infered from `arr` shape
    matrix_dim = np.prod(hilbert_space_dims)
    batch_size = arr.shape[-1]
    # the matrix operator is passed a tensor, in order to update the diagonal entries, we need to reshape into a matrix (no copy)
    matricized_arr_view = arr.reshape(matrix_dim, matrix_dim, batch_size)
    # use diagonal indices from enclosing scope instead of recomputing them here
    # if batch dimension > 1, broadcasting over the batch index is implicit in th
    matricized_arr_view[diag_inds] = cp.sin(cp.pi * t * omegas * number_operator_diag)
    print(f"User-defined callback function updated matrix operator in place.")
    return


## create a LocalDenseMatrixOperator instance with the static and dynamic components
# the callback needs to be wrapped as `GPUCallback` and we need to specify that it is an inplace callback
hamiltonian_matrix_op = LocalDenseMatrixOperator(
    static_matrix_op_arr, callback=GPUCallback(matrix_op_callback, is_inplace=True)
)
print("Created time-dependent dense matrix Hamiltonian defining the closed quantum system.")

## compose an `OperatorTerm` describing the closed system evolution under the time-dependent Hamiltonian defined by hamiltonian_matrix_op
# `full_matrix_product` accepts a tuples of `LocalDenseMatrixOperators` and booleans as optional positional arguments
# the first boolean specifies whether the first element of the tuple (a `LocalDenseMatrixOperator`) acts on the bra (`True`) or ket (`False``, default) modes of the quantum state
# the second boolean specifies whether to apply the complex conjugate transpose of the `LocalDenseMatrixOperator` (defaults to `False`)
hamiltonian_term = full_matrix_product((hamiltonian_matrix_op, False, False), coeff=1)
print("Created `OperatorTerm` consisting of the time-dependent Hamiltonian.")
# express the commutator [H, rho] where rho is the quantum state and H the Hamiltonian:

#############
# Dissipators
#############

## create a non-hermitian dense matrix operator as a dissipative term
matrix_op_arr = cp.empty((*hilbert_space_dims, *hilbert_space_dims, batch_size), dtype=dtype, order="F")
matrix_op_arr[:] = cp.random.normal(size=static_matrix_op_arr.shape)
if "complex" in dtype:
    matrix_op_arr[:] = 1j * (cp.random.normal(size=static_matrix_op_arr.shape))
dissipative_matrix_op = LocalDenseMatrixOperator(matrix_op_arr)
print("Created a dense matrix operator defining a single Lindblad dissipator.")
## compose the dissipative part of the Lindblad superoperator
# `LocalDenseMatrixOperators` are applied to the quantum state in the order they are passed (left to right below)
dissipative_ket_terms = full_matrix_product(
    (dissipative_matrix_op, False, True), (dissipative_matrix_op, False, False), coeff=-dissipation_strength / 2
)
# dual method flips the order and whether the components of the product are applied to bra or ket modes of the quantum state
dissipative_bra_terms = dissipative_ket_terms.dual()
dissipative_two_sided_terms = full_matrix_product(
    (dissipative_matrix_op, False, False), (dissipative_matrix_op, True, True), coeff=dissipation_strength
)
dissipative_terms = dissipative_two_sided_terms + dissipative_bra_terms + dissipative_ket_terms
print("Created `OperatorTerm`s defining the dissipative terms in the Lindblad equation.")

#############
# Liouvillian
#############

closed_system_op = Operator(hilbert_space_dims, (hamiltonian_term, -1j, False), (hamiltonian_term, +1j, True))
print("Created closed system Liouvillian Super-Operator.")

## compose the full Lindbladian as an `Operator`
liouvillian = closed_system_op + Operator(hilbert_space_dims, (dissipative_terms,))
print("Created Lindbladian superoperator from closed system liouvillian and dissipative terms.")


##########################
# Initial and final states
##########################

# get random normalized initial mixed state
rho = DenseMixedState(ctx, hilbert_space_dims, batch_size, dtype)
rho.attach_storage(cp.empty(rho.storage_size, dtype=dtype))
rho_arr = rho.view()
rho_arr[:] = cp.random.normal(size=rho_arr.shape)
if "complex" in dtype:
    rho_arr[:] += 1j * (cp.random.normal(size=rho_arr.shape))
rho_arr += take_complex_conjugate_transpose(rho_arr)
rho_arr /= rho.trace()
print("Created a Haar random normalized mixed quantum state.")
# get empty output state
rho_out = rho.clone(cp.zeros_like(rho.storage, order="F"))

print("Created zero initialized output state.")

#################
# Operator action
#################

# callback parameters
t = 0.3
num_params = 1
callback_args = cp.empty((num_params, batch_size), dtype=dtype, order="F")
callback_args[:] = cp.random.rand(num_params, batch_size)

# compute action of lindbladian on quantum state
liouvillian.prepare_action(ctx, rho)
print("Prepared the action of a  time-dependent, dissipative superoperator based on full matrices on a mixed quantum state.")
liouvillian.compute_action(t, callback_args, rho, rho_out)
print("Computed the action of a time-dependent, dissipative superoperator based on full matrices on a mixed quantum state.")

print("Finished computation and exit.")

Lindbladian example

The following example shows how to use methods of OperatorTerm and Operator to define a function that assembles a dissipative Lindbladian superoperator from a Hamiltonian and a collection of jump operators.

from typing import Callable, Union, Sequence, Tuple
from numbers import Number

import numpy as np
import cupy as cp


from cuquantum.densitymat import Operator, OperatorTerm, GPUCallback, CPUCallback, DenseMixedState, WorkStream

CallbackType = Union[GPUCallback, CPUCallback]
NDArrayType = Union[np.ndarray, cp.ndarray]
CoefficientType = Union[Number, NDArrayType, CallbackType, Tuple[NDArrayType, CallbackType]]

###
# This example shows how to define a lindbladian master equation using methods of OperatorTerm and Operator in the function `get_lindbladian` below.
# Other examples, e.g. dense_operator_example.py and multidiagonal_operator_example.py, show an alternative way to define the same type of master equation. This is example is based on multidiagonal_operator_example.py for simplicity.
# The methods used in `get_lindbladian` are convenience functionality.
# Applying liouvillian operators as shown here may be slower than applying the same liouvillian defined directly from instances of `DenseOperator` or `MultidiagonalOperator` as in the other examples.
###


def get_lindbladian(
    hamiltonian: OperatorTerm | Operator,
    jump_operators: Sequence[OperatorTerm],
    dissipation_strengths: CoefficientType | tuple[CoefficientType],
) -> OperatorTerm | Operator:
    """
    Assembles the operator corresponding to the Lindblad master equation given a Hamiltonian specifying the closed system evolution and a sequence of jump operators and their respective dissipation strength.

    Parameters:
    hamiltonian: Union[OperatorTerm, Operator]
        The Hamiltonian specifying the closed system evolution.
    jump_operators: Sequence[OperatorTerm]
        Jump operators specifying the coupling to the environment.
        Note that each element, i.e. jump operator, must act only on the ket-modes of the quantum state (which corresponds to the default duality argument in OperatorTerm creation.).
    dissipation_strengths: CoefficientType | tuple[CoefficientType]
        Coefficients of system environment coupling.
        If not specified as sequence, the same value is used for all jump operators.

    Returns:
    Union[OperatorTerm, Operator]
        The Lindbladian, returned as the same type as input `hamiltonian`.
    """
    if isinstance(hamiltonian, (OperatorTerm, Operator)):
        liouvillian = -1j * (hamiltonian + (hamiltonian.dual() * (-1)))
    else:
        raise TypeError(
            f"Expected input argument `hamiltonian` of type OperatorTerm or Operator, received type {type(hamiltonian)}."
        )
    dissipative_part = None
    for i, jump_op in enumerate(jump_operators):
        if not isinstance(jump_op, OperatorTerm):
            raise TypeError(
                f"Expected elements of input argument `jump_operators` of type OperatorTerm, received type {type(jump_op)}."
            )
        if not all(all(set(_dual) == {False} for _dual in dual) for dual in jump_op.duals):
            raise ValueError("Jump operators that act on bra modes are not supported in `get_lindbladian`.")
        squared_term = jump_op * jump_op.dag()
        two_sided_term = jump_op * jump_op.dual().dag()
        dissipation_strength = dissipation_strengths[i] if isinstance(dissipation_strengths, tuple) else dissipation_strengths
        if not dissipative_part:
            dissipative_part = -1 / 2 * (squared_term * dissipation_strength)
        else:
            dissipative_part += -1 / 2 * (squared_term * dissipation_strength)
        dissipative_part += -1 / 2 * (squared_term.dual() * (dissipation_strength))
        dissipative_part += two_sided_term * dissipation_strength
    liouvillian += dissipative_part
    return liouvillian


import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    full_matrix_product,
    MultidiagonalOperator,
    DenseMixedState,
    WorkStream,
    Operator,
    GPUCallback,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])

# Parameters
dtype = "complex128"
hilbert_space_dims = (4, 3, 5, 7)
h_offsets = [
    [-1, 0, 1],
    [0],
    [-1, 0, 1],
    [-1],
]
l_offsets = [[-1, 0, 1], [1], [-2, -1, 0, 1, 2], [1, 2]]
num_modes = len(hilbert_space_dims)
modes_string = "abcdefghijkl"[: len(hilbert_space_dims)]
batch_size = 1


def take_complex_conjugate_transpose(arr):
    return arr.transpose(tuple(range(num_modes, 2 * num_modes)) + tuple(range(0, num_modes)) + (2 * num_modes,)).conj()


#############
# Hamiltonian
#############

h0_arr = cp.empty((hilbert_space_dims[0], len(h_offsets[0]), batch_size), dtype=dtype)
h0_arr[:] = cp.random.rand(*h0_arr.shape)
h1_arr = cp.empty((hilbert_space_dims[1], len(h_offsets[1]), batch_size), dtype=dtype)
h1_arr[:] = cp.random.rand(*h1_arr.shape)
h2_arr = cp.empty((hilbert_space_dims[2], len(h_offsets[2]), batch_size), dtype=dtype)
h2_arr[:] = cp.random.rand(*h2_arr.shape)
h3_arr = cp.empty((hilbert_space_dims[3], len(h_offsets[3]), batch_size), dtype=dtype)
h3_arr[:] = cp.random.rand(*h3_arr.shape)


def callback1(t, args, arr):  # inplace callback
    assert len(args) == 1, "Length of args should be 1."
    omega = args[0]
    for i in range(hilbert_space_dims[0]):
        for j in range(len(h_offsets[0])):
            for b in range(batch_size):
                arr[i, j, b] = (i + j) * np.sin(omega * t)


def callback2(t, args):  # out-of-place callback
    assert len(args) == 1, "Length of args should be 1."
    omega = args[0]
    arr = cp.empty((hilbert_space_dims[1], len(h_offsets[1]), batch_size), dtype=dtype)
    for i in range(hilbert_space_dims[1]):
        for j in range(len(h_offsets[1])):
            for b in range(batch_size):
                arr[i, j, b] = (i + j) * np.cos(omega * t)
    return arr


h0_callback = GPUCallback(callback1, is_inplace=True)
h1_callback = GPUCallback(callback2, is_inplace=False)

h0_op = MultidiagonalOperator(h0_arr, h_offsets[0], h0_callback)
h1_op = MultidiagonalOperator(h1_arr, h_offsets[1], h1_callback)
h2_op = MultidiagonalOperator(h2_arr, h_offsets[2])
h3_op = MultidiagonalOperator(h3_arr, h_offsets[3])

H = (
    tensor_product((h0_op, [0]), (h1_op, [1]))
    + tensor_product((h1_op, [1]), (h2_op, [2]))
    + tensor_product((h2_op, [2]), (h3_op, [3]))
)

print("Created an OperatorTerm for the Hamiltonian.")

#############
# Dissipators
#############

l0_arr = cp.empty((hilbert_space_dims[0], len(l_offsets[0]), batch_size), dtype=dtype)
l0_arr[:] = cp.random.rand(*l0_arr.shape)
l1_arr = cp.empty((hilbert_space_dims[1], len(l_offsets[1]), batch_size), dtype=dtype)
l1_arr[:] = cp.random.rand(*l1_arr.shape)
l2_arr = cp.empty((hilbert_space_dims[2], len(l_offsets[2]), batch_size), dtype=dtype)
l2_arr[:] = cp.random.rand(*l2_arr.shape)
l3_arr = cp.empty((hilbert_space_dims[3], len(l_offsets[3]), batch_size), dtype=dtype)
l3_arr[:] = cp.random.rand(*l3_arr.shape)

l0_op = MultidiagonalOperator(l0_arr, l_offsets[0])
l1_op = MultidiagonalOperator(l1_arr, l_offsets[1])
l2_op = MultidiagonalOperator(l2_arr, l_offsets[2])
l3_op = MultidiagonalOperator(l3_arr, l_offsets[3])

Ls = []
for i, l in enumerate([l0_op, l1_op, l2_op, l3_op]):
    Ls.append(
        tensor_product(
            (l, [i], [False]),
        )
    )

print("Created OperatorTerms for the Liouvillian.")

#############
# Liouvillian
#############
dissipation_strengths = tuple(np.random.rand(4))
liouvillian = Operator(hilbert_space_dims, (get_lindbladian(H, Ls, dissipation_strengths),))
print("Created the Liouvillian operator.")

################
# Density matrix
################
ctx = WorkStream()

rho = DenseMixedState(ctx, hilbert_space_dims, batch_size, dtype)
rho.attach_storage(cp.empty(rho.storage_size, dtype=dtype))
rho_arr = rho.view()
rho_arr[:] = cp.random.normal(size=rho_arr.shape)
if "complex" in dtype:
    rho_arr[:] += 1j * cp.random.normal(size=rho_arr.shape)
rho_arr += take_complex_conjugate_transpose(rho_arr)
rho_arr /= rho.trace()
print("Created a Haar random normalized mixed quantum state.")

rho_out = rho.clone(cp.zeros_like(rho.storage, order="F"))
print("Created zero initialized output state.")

#################
# Operator action
#################

liouvillian.prepare_action(ctx, rho)
liouvillian.compute_action(1.0, [3.5], rho, rho_out)

print("Finished computation and exit.")

API reference

Context API

WorkStream(device_id, stream, memory_limit, ...)

A data class containing the library handle, stream, workspace and configuration parameters.

Callback API

CPUCallback(callback[, is_inplace])

Wrapper for CPU callback functions.

GPUCallback(callback[, is_inplace])

Wrapper for GPU callback functions.

Operator API

tensor_product(*operands[, coeff, ...])

Return an OperatorTerm from a tensor product of elementary operators.

full_matrix_product(*operands[, coeff, ...])

Return an OperatorTerm from a product of matrix operators defined on the full Hilbert space.

DenseOperator(data[, callback, copy])

Dense elementary operator from data buffer and optional callback.

MultidiagonalOperator(data, offsets[, ...])

Multidiagonal single-mode operator from data buffer, offsets and optional callback.

LocalDenseMatrixOperator(data[, callback])

Dense matrix operator stored fully on a single device.

OperatorTerm([dtype])

Operator term consisting of tensor products of elementary operators.

Operator(hilbert_space_dims, *terms)

Operator representing a collection of OperatorTerm objects.

OperatorAction(ctx, operators)

Operator action representing the action of a set of Operator objects on a set of input states, accumulated into a single output state.

State API

DensePureState(ctx, hilbert_space_dims, ...)

Pure state in dense (state-vector) representation.

DenseMixedState(ctx, hilbert_space_dims, ...)

Mixed state in dense (density-matrix) representation.