High-level Density Matrix APIs

The pythonic API closely follows the structure of the Density Matrix C-API, providing separate objects for different concepts such as quantum states and operators.

WorkStream

At the base of the pythonic API is the WorkStream object which aggregates some configuration settings, cuDensityMat library handle, CUDA stream, CUDA device and workspace buffer into a single object. The workspace buffer is allocated automatically as required by the other API objects interacting with the WorkStream instance. An upper limit to the allocated workspace buffer size can be specified at the creation of the WorkStream instance. The workspace buffer is not released until an explicit call to the WorkStream.release_workspace() method is made. In general usage, only a single WorkStream per device should be created. Other API objects can only interact with one WorkStream for their lifetime. Currently only blocking operation is supported, i.e. API calls return only after completion of the task on the device.

from cuquantum.densitymat import WorkStream
ctx = WorkStream(device_id=0)

Multi-GPU multi-node execution is supported by passing a communicator to the WorkStream via its WorkStream.set_communicator() method. Note that this step has to be performed before the WorkStream is passed to any other API object. Please follow the instructions in cuDensityMat MPI setup to set environment variables for MGMN execution. Apart from setting the communicator, the only place where multi-GPU execution has any consequence is for specifying quantum states, the coefficients of which will be distributed over all GPUs.

from mpi4py import MPI
import cupy as cp
from cuquantum.densitymat import WorkStream

comm = MPI.COMM_WORLD.Dup()
rank = comm.Get_rank()
num_devices = cp.cuda.runtime.getDeviceCount()
dev_id = rank % num_devices
cp.cuda.Device(dev_id).use()
ctx = WorkStream(device_id=dev_id)
ctx.set_communicator(comm, provider="MPI")

Tensor Spaces and Quantum States

A tensor space is a composite linear space constructed as a tensor product of one or more vector spaces. Each constituting vector space represents a quantum degree of freedom of dimension equal to the dimension of the vector space. The shape of the composite tensor space is a tuple of the dimensions of the constituting vector spaces in their original order. Note that the shape of the tensor space stays the same irrespective of whether we deal with pure states (vectors living in the tensor space) or mixed states (matrices associated with the same tensor space).

A pure quantum state is represented by a tensor from the specified tensor space, thus the shape of a pure quantum state coincides with the tensor space shape. A mixed quantum state is represented by a tensor belonging to the tensor space constructed as a tensor product of the specified tensor space (primary space or ket space) and its dual tensor space (dual space or bra space). Thus, the shape of a mixed quantum state is a concatenation of two tensor space shapes, where the first half of the tensor modes corresponds to the ket space while the second half of the tensor modes corresponds to the bra space. For example, if tensor \(V[i0, i1, i2]\) represents a pure quantum state from a tensor space constructed from three vector spaces (quantum degrees of freedom), then tensor \(D[i0, i1, i2; j0, j1, j2]\) represents a mixed quantum state from the same tensor space, where modes \(i0, i1, i2\) are the ket modes while modes \(j0, j1, j2\) are the bra modes. The corresponding tensor space shape in both cases is \([range(i0), range(i1), range(i2)]\).

In the pythonic API, a pure (mixed) quantum state is represented by an instance of DensePureState (DenseMixedState). Below, we use the dense pure state DensePureState class as an example to provide detailed instructions. The workflow for dense mixed states is identical, and the two classes have identical methods and attributes.

In order to create a quantum state, we pass a WorkStream, the Hilbert space dimensions, batch size and datatype:

import cupy as cp
from cuquantum.densitymat import DensePureState

hilbert_space_dims = (2, 6, 4, 3)
batch_size = 1
dtype = "complex64"
psi_1 = DensePureState(ctx, hilbert_space_dims, batch_size, dtype)

The created quantum state has no associated physical storage yet.

In order to attach physical storage to a quantum state, there are two options:

psi_1.allocate_storage()

DensePureState.allocate_storage() will allocate a cupy.ndarray of the correct datatype with psi1.storage_size elements, initialized to zero.

size = psi_1.storage_size
psi_1.attach_storage(cp.zeros(size, dtype = psi_1.dtype))

DensePureState.attach_storage() allows the user to pass a cupy.ndarray with DensePureState.storage_size elements located on the same device as the one passed to WorkStream. For single-GPU calculations, the storage size corresponds to the number of coefficients of the state, and the array passed can be multidimensional. In that case, the array needs to be Fortran-ordered, contiguous, and matching the Hilbert space dimensions and batch size.

For a multi-GPU setup, each process only stores a subset of state coefficients, which we will refer to as a slice. The size of the slice may be equal or smaller than the storage buffer size. Only one-dimensional storage buffers are allowed in multi-GPU execution. The slice is always located in the first DensePureState.storage_size elements of the storage buffer.

DensePureState.storage returns the attached storage buffer. For states with a total of fewer than 32 modes, a multidimensional, writeable view on the slice is returned by DensePureState.view(). For larger systems, this method will fail due to cupy.ndarray not supporting more than 32 dimensions. DensePureState.local_info returns the shape of the local slice as well as the offsets along each dimension. This information is required to interpret the state’s coefficients or modify them to, for example, specify an initial state. Note that the batch size is always reported as the last mode, even if the batch size is 1.

For scripts purely intended for single-GPU execution, both DensePureState.storage_size and DensePureState.local_info directly follow from the Hilbert space dimensions and batch size, and the user may omit these steps. Note that by passing a one-dimensional array of size DensePureState.storage_size, user scripts will be directly portable between single-GPU and multi-GPU execution.

For states with the same Hilbert space dimensions and batch size, a convenient way of creating a new instance and attaching storage directly is provided via DensePureState.clone().

psi_2 = psi_1.clone(cp.zeros_like(psi_1.storage))

Besides direct manipulation of the storage buffer by the user, there are several methods which allow element-wise operations, DensePureState.inplace_scale() and DensePureState.inplace_accumulate(). Unary and binary reductions on DensePureState and DenseMixedState instances are provided via DensePureState.norm(), DensePureState.inner_product() and DensePureState.trace().

Quantum Many-Body Operators (Super-Operators)

A broad class of quantum Liouvillian superoperators, or quantum many-body operators in general, can be expressed as a sum of tensor products of elementary tensor operators with some, generally time-dependent coefficients, where an elementary tensor operator is a tensor operator acting on a specific number of quantum degrees of freedom, either from the right or from the left, or both. Each elementary tensor operator has the same number of the ket and bra modes, with the ket modes preceding the bra modes in its tensor. The bra modes of the elementary tensor operator act on the ket modes of the quantum state (action from the left) while the ket modes of the elementary tensor operator act on the bra modes of the quantum state (action from the right). In practice, many quantum Liouvillian operators are expressed in terms of tensor products of the one-body elementary tensor operators, like creation/annihilation operators, particle number operator, coordinate/momentum operators, individual spin operators, etc. For the sake of concreteness, let’s consider the following quantum Liouvillian operator as an example:

\[L = \omega_{c} b^{\dagger} b + \frac{1}{2} \omega_{a} \sigma_{z} + [ \frac{1}{2} g(t, \alpha, \beta) b^{\dagger} \sigma_{-} + \frac{1}{2} g(t, \alpha, \beta) b \sigma_{+} ] + \gamma \sigma_{-}\]

In the above Liouvillian operator, \(b\), \(b^{\dagger}\), \(\sigma_{z}\), \(\sigma_{-}\), \(\sigma_{+}\) are elementary tensor operators, and each of them acts only on a single quantum degree of freedom, represented by an order-2 tensor \(U[i0; j0]\), where \(i0\) is the ket mode and \(j0\) is the bra mode (an order-k elementary tensor operator is repsented by a tensor with \(k\) ket modes and \(k\) bra modes). \(\omega_{c}\), \(\omega_{a}\), and \(\gamma\) are constant scalars. \(g(t, \alpha, \beta)\) is a time-dependent scalar coefficient that depends on time and two real parameters, \(\alpha\) and \(\beta\), which parameterize the Liouvillian operator.

Elementary operators

Tensors that define elementary operators can be provided directly as cupy.ndarray (on the WorkStream‘s device) or numpy.ndarray. The pythonic API also provides DenseOperator and MultiDiagonalOperator objects. MultiDiagonalOperators are operators acting on a single mode, specified in sparse DIA format. Multidiagonal tensors (i.e. covering more than a single mode) and multidiagonal matrices with diagonals above the first superdiagonal or below the first subdiagonal are not supported currently. DenseOperators have underlying dense storage. Both objects allow selecting unary and binary operations and also support passing in a CPU callback function with signature (t: float, args: Sequence) -> numpy.ndarray that sets the tensor storage’s elements dynamically. Regardless of whether the tensors are passed as cupy.ndarray / numpy.ndarray or as DenseOperator / MultidiagonalOperator, a copy of the originally provided array will be made. If the tensor data is originally provided as a numpy.ndarray, it will be moved to GPU (the WorkStream‘s device) the first time the elementary operator is being used as part of an Operator or OperatorAction. If an elementary operator appears multiple times in the expression for the quantum many-body operator, using the DenseOperator and MultiDiagonalOperator allows us to reuse the same instance and is thus preferred.

Composing symbolic expressions with elementary operators

Individual elementary operators can be composed into products and sums thereof. The API provides several layers for such products and sums which are composed via aggregation.

Products of elementary operators are most conveniently composed by the tensor_product() free function. The arguments to tensor products are tuples containing the elementary operator, a sequence of modes it’s supported on, and optionally a sequence of booleans (“dualities”) which denotes whether the operator contracts with the ket or bra modes of the state (for mixed states, defaulting to False, i.e. ket modes). Furthermore, a coefficient can be supplied via a named argument.

A tensor product of operators is created via tensor_product(). This function returns an OperatorTerm object, which represents a sum of products of elementary tensor operators. OperatorTermsupports in-place (+= and OperatorTerm.append() method) and out-place addition (+) to compose the sum, as well as multiplication by a numbers.Number or Callable. Note that all Callables used (for coefficients or elementary operator callback functions) have a signature of (t, args) and must accept the same arguments. Inplace addition is only supported before the OperatorTerm starts using a WorkStream (see below). Additionally, multiplication of OperatorTerms is supported, creating a new instance which contains the product of sums of products of the factors, with the number of terms in the sum being equal to the product of the number of terms in the original OperatorTerms. We would like to note that the computation time scales roughly linearly in the number of products in the OperatorTerm, and thus advise against using multiplication of OperatorTerms if a simplification can be performed manually instead.

An Operator contains a sum of OperatorTerms, and each OperatorTerm is associated with a coefficient (defaults to 1) and a boolean duality (defaults to False). If duality is True, the action on bra and ket modes as specified for each product in the OperatorTerm is flipped. Operator.dual() returns a new instance with these boolean arguments flipped, which is useful to express e.g. commutators. Operators support out-of-place addition and substraction with other Operator instances, as well as in-place (+=, Operator.append() method) and out-of-place (+) addition of OperatorTerm. In-place addition is only supported before the Operatorstarts using a WorkStream (see below). Furthermore, scalar multiplication by a numbers.Number or Callable is allowed.

The action of an Operator on a quantum state state_in accumulated into a quantum state state_out can be computed via Operator.compute_action(). Expectation value on a quantum state can be evaluated via Operator.compute_expectation(). Before performing these calls, Operator.prepare_action() (resp. Operator.prepare_expectation()) should be invoked once, which remains valid for the lifetime of the object.

While the concept of a Liouvillian superoperator and its action on a quantum state is sufficient to specify the right-hand side of the broadly used master equations, like the Lindblad equation, in a more general setting one may need to be able to specify a system of coupled ordinary differential equations (ODE) to simultaneously evolve multiple (potentially batched) quantum states under their own Liouvillian operators, as in the hierarchical equations of motion (HEOM) method.

Such an ODE can be expressed by OperatorAction. An OperatorAction contains a sequence of Operator, which can be applied to a sequence of quantum states via OperatorAction.compute(), where the result is accumulated into a single output state.

Note that issuing a Operator.prepare_action() / Operator.prepare_expectation() call on Operator or creating an OperatorAction can have side effects on all objects that they depend on (directly and indirectly). In particular the object and its dependencies will be using the WorkStream instance passed in, which is not reversible as explained above. If the dependent objects are already using a different WorkStream instance, an error will be raised. Generally we advise against using more than a single WorkStream instance per GPU.

Usage examples

The following example illustrates various ways to define elementary operators and compose them into OperatorTerm, Operator, OperatorAction. It is written for single-GPU execution, although enabling multi-GPU execution is straightforward as shown in the next example.

# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    DenseMixedState,
    DenseOperator,
    WorkStream,
    Operator,
    OperatorAction,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])
print("========================")


# define the shape of the composite tensor product space
hilbert_space_dims = (4, 5, 2, 6, 3, 7)  # six quantum degrees of freedom

# define some elementary tensor operators
A = np.random.random((hilbert_space_dims[2],) * 2)  # one-body elementary tensor operator


b = np.random.random(  # two-body elementary tensor operator, arbitrary strides also supported
    (
        hilbert_space_dims[3],
        hilbert_space_dims[5],
    )
    * 2
)


# We can wrap the NDArray in a TensorOperator, which is useful if we want to use the same tensor multiple times.
B = DenseOperator(b)


# one-body elementary tensor callback operator
def c_callback(t, args):
    return np.random.random((hilbert_space_dims[1],) * 2)


c_representative_tensor = c_callback(0.0, ())
# making an instance of TensorOperator is optional, we can also pass the tuple of (c_representative_tensor, c_callback) directly below in palce of `C`
C = DenseOperator(c_representative_tensor, c_callback)

print("Defined elementary operators A, B, C.")


# define a scalar callback function (time-dependent coefficient)
def my_callback(t, args):  # args is an arbitrary list of real user-defined parameters
    _omega = args[0]
    return np.sin(np.pi * _omega * t)  # return the scalar parameterized coefficient at time t


# construct tensor products of elementary tensor operators
ab = tensor_product(
    (
        A,  # elementary tensor operator
        (2,),  # quantum degrees of freedom it acts on
    ),
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    coeff=1.0,  # constant (static) coefficient
)

bc = tensor_product(
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
    ),
    coeff=my_callback,  # time-dependent parameterized coefficient represented by a user-defined callback function
)

# construct different operator terms
term1 = ab + bc  # an OperatorTerm composed of a sum of two tensor operator products
term1 += bc  # `OperatorTerm` also supports in-place addition

term2 = tensor_product(  # an operator term composed of a single elementary tensor operator
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
        (False,),  # operator action duality (side: left/right) for each quantum degree of freedom
    ),
    coeff=1.0,  # constant (static) coefficient
)

print("Created OperatorTerms term1 and term2.")

# construct the Hamiltonian operator from two operator terms
hamiltonian = Operator(
    hilbert_space_dims,  # shape of the composite tensor space
    (term1,),  # first operator term with a default coefficient 1.0
    (
        term2,
        my_callback,
    ),  # second operator term modulated by a parameterized time-dependent coefficient (callback function)
)

print("Created Hamiltonian Operator from term1 and term2.")

# construct the Liouvillian for the von Neumann equation
liouvillian = (
    hamiltonian - hamiltonian.dual()
)  # Hamiltonian action on the left minus Hamiltonian action on the right: [H, *]

print("Created Liouvillian Operator from Hamiltonian.")

# open a work stream over a CUDA stream
my_stream = cp.cuda.Stream()
ctx = WorkStream(stream=my_stream)

# construct the Liouvillian action for a single quantum state
liouvillian_action = OperatorAction(ctx, (liouvillian,))

print("Created Liouvillian OperatorAction from Liouvillian.")

# create a mixed quantum state (density matrix) with zero initialized data buffer
batch_size = 1
rho0 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
slice_shape, slice_offsets = rho0.local_info
rho0.attach_storage(cp.zeros(rho0.storage_size, dtype=rho0.dtype))
# set storage to a Haar random unnormalized state
# for MGMN execution, the data buffer may be larger than the locally stored slice of the state
# the view method returns a tensor shaped view on the local slice (the full state for single-GPU execution)
rho0.view()[:] = cp.random.normal(size=slice_shape) + (
    1j * cp.random.normal(size=slice_shape)
)
# for non-random initialization and MGMN execution, we would use slice_offsets to determine how to set the elements
norm = rho0.norm().get()[()]
rho0.inplace_scale(np.sqrt(1 / norm))
assert np.isclose(rho0.norm().get()[()], 1)

print(
    "Created a Haar random normalized mixed quantum state (not physical due to lack of hermitianity)."
)

# two ways of creating another mixed quantum state of the same shape and init it to zero
rho1 = rho0.clone(cp.zeros_like(rho0.storage))
rho2 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
rho2.allocate_storage()

print("Created a zero-initialized output mixed quantum state.")


# prepare operator action on a mixed quantum state
liouvillian_action.prepare(ctx, (rho0,))

print("Prepared Liouvillian action through OperatorAction.prepare.")

# set a parameter for the callback function to some value
omega = 2.4

# compute the operator action on a given quantum state
liouvillian_action.compute(
    0.0,  # time value
    (omega,),  # user-defined parameters
    (rho0,),  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through OperatorAction.compute.")

# alternatively, prepare the operator action directly via the operator
liouvillian.prepare_action(ctx, rho0)

print("Prepared Liouvillian action through Operator.prepare_action.")

# compute the operator action directly via the operator
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)
# OperatorAction.compute and Operator.compute_action are accumulative, so rho0p should now be equivalent to twice the action of the Liouvillian.

print("Computed Liouvillian action through Operator.compute_action.")

# prepare the operator expectation value computation
liouvillian.prepare_expectation(ctx, rho0)

print("Prepared expectation through Operator.prepare_expectation.")

# compute the operator expectation value
expval = liouvillian.compute_expectation(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho1,  # input quantum state
)

print("Computed expectation through Operator.compute_expectation.")

# we can compute the operator action again without another prepare call
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action.")

# assuming we want to some other task in between it may be useful to release the workspace
ctx.release_workspace()

# releasing the workspace has no user-facing side effects
# for example, we do not need to reissue prepare calls
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action after releasing workspace.")

# synchronize the work stream
my_stream.synchronize()

print("Finished computation and exit.")

The following is a minimal working example for multi-GPU multi-node (MGMN) execution. It includes a utility function to set the quantum state to a specific product state in the computational basis, which illustrates how to interpret DensePureState.local_info and work with the one-dimensional quantum state storage buffer required for MGMN execution.

# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from mpi4py import MPI

from cuquantum.densitymat import (
    tensor_product,
    DensePureState,
    DenseOperator,
    WorkStream,
    OperatorTerm,
    Operator,
    OperatorAction,
)

NUM_DEVICES = cp.cuda.runtime.getDeviceCount()
rank = MPI.COMM_WORLD.Get_rank()
dev = cp.cuda.Device(rank % NUM_DEVICES)
dev.use()
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])
print("========================")


# create Workstream on the current device
ctx = WorkStream(device_id=dev.id)

# setup MPI communicator
ctx.set_communicator(comm=MPI.COMM_WORLD.Dup(), provider="MPI")

# define the shape of the composite tensor product space
hilbert_space_dims = (4, 4, 4, 4, 4)  # six quantum degrees of freedom
batch_size = 2

# define some elementary tensor operators
identity = DenseOperator(np.eye(hilbert_space_dims[0], dtype="complex128"))
op_term = OperatorTerm(dtype="complex128")
for i in range(len(hilbert_space_dims)):
    op_term += tensor_product(
        (
            identity,
            (1,),
        )
    )
# This operator will just be proportional to the identity
op = Operator(hilbert_space_dims, (op_term,))
op_action = OperatorAction(ctx, (op,))


def set_ditstring(state, batch_index, ditstring: list):
    """
    Set's the state's coefficient at for the `batch_index`'th quantum state to the product state in the computational basis encoded by `ditstring`.
    """
    slice_shape, slice_offsets = state.local_info
    ditstring = np.asarray(
        ditstring
        + [
            batch_index,
        ],
        dtype="int",
    )
    ditstring_is_local = True
    state_inds = []
    for slice_dim, slice_offset, state_dit in zip(
        slice_shape, slice_offsets, ditstring
    ):
        ditstring_is_local = state_dit in range(slice_offset, slice_offset + slice_dim)
        if not ditstring_is_local:
            break
        else:
            state_inds.append(
                range(slice_offset, slice_offset + slice_dim).index(state_dit)
            )
    if ditstring_is_local:
        strides = (1,) + tuple(np.cumprod(np.array(slice_shape)[:-1]))
        ind = np.sum(strides * np.array(state_inds))
        state.storage[ind] = 1.0


# product states to be set for each batch state
global_ditstrings = [[0, 1, 3, 2, 0], [1, 0, 3, 2, 1]]

# make initial state
state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
required_buffer_size = state.storage_size
state.attach_storage(cp.zeros((required_buffer_size,), dtype="complex128"))
# set product states for each batch input state
for batch_ind in range(batch_size):
    set_ditstring(state, batch_ind, global_ditstrings[batch_ind])
# more ways to make a State instance
state_out = state.clone(cp.zeros(required_buffer_size, dtype="complex128"))
another_state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
another_state.allocate_storage()
# prepare and compute Operator action
op.prepare_action(ctx, state)
op.compute_action(0.0, [], state, state_out)
state_out_slice = state_out.view()
# compute Operator action
op_action.compute(
    0.0,
    [],
    [
        state,
    ],
    another_state,
)
# OperatorAction and Operator for this specific example have the same effect
assert cp.allclose(another_state.view(), state_out_slice)