Pauli Propagation APIs#

The experimental pythonic API in cuquantum.pauliprop.experimental provides a high-level interface for accelerating Pauli propagation simulations using the cuPauliProp library.

Introduction#

Pauli propagation is a technique for simulating quantum circuits by tracking how Pauli strings (tensor products of Pauli operators) evolve under the action of quantum gates and channels. This approach can be particularly efficient for certain classes of circuits and observables, offering an alternative to full state vector simulation.

For a detailed introduction to Pauli propagation concepts, the Pauli basis representation, supported gates and truncation strategies, please refer to the cuPauliProp C API Overview.

The cuquantum.pauliprop.experimental module provides:

LibraryHandle: Manages library resources and GPU device context.
PauliExpansion: A container for representing quantum operators as a sum of Pauli strings with coefficients.
PauliExpansionView: A non-owning view on a contiguous range of Pauli terms within a PauliExpansion.
Quantum operator classes for applying operations to Pauli expansions:
- PauliRotationGate: Pauli rotation gates exp(-i * angle/2 * P).
- CliffordGate: Clifford gates (H, S, CX, CZ, SWAP, etc.).
- PauliNoiseChannel: Pauli noise channels for 1 or 2 qubits.
- AmplitudeDampingChannel: Generalised amplitude damping channels.
Truncation: Truncation strategies to manage the growth of terms during propagation.

API Design#

This section explains the key design concepts of the Pauli propagation Python API.

PauliExpansion, views, and out-of-place operations#

A PauliExpansion is the central data structure which represents a quantum operator, such as an observable or density matrix, as a weighted sum of Pauli strings. It wraps user-provided GPU memory buffers for storing the Pauli strings (as packed integers, described in a subsequent section) and their coefficients.

A PauliExpansionView is a lightweight, read-only handle to a contiguous subset of terms (each, a Pauli string and corresponding coefficient) within an expansion. Views have two key properties:

Contiguous subset: A view represents a range [start_index, end_index) of terms from its parent expansion. This enables advanced memory management patterns such as hybridising depth-first and breadth-first evaluation of Pauli paths.
Read-only and out-of-place: Views do not modify the underlying expansion. All computational methods on views read from the view and write results to a separate output expansion. The only exception is PauliExpansion.populate_from(), which copies data from another expansion (or view) into the target expansion’s buffers in-place.

To avoid repeated memory allocations, output expansions can be provided explicitly via the expansion_out parameter. This enables a “ping-pong” pattern where two expansions alternate as input and output:

for gate in reversed(circuit):
    in_expansion.apply_gate(gate, expansion_out=out_expansion, adjoint=True)
    in_expansion, out_expansion = out_expansion, in_expansion  # swap

If expansion_out is not provided, the API automatically allocates and returns a new expansion with sufficient capacity:

out_expansion = in_expansion.apply_gate(gate, adjoint=True)

Creating a view is straightforward:

# Create a view covering all terms
view = expansion.view()

# Create a view covering a subset of terms
partial_view = expansion.view(start_index=0, end_index=100)

For convenience, PauliExpansion provides wrapper methods that automatically create a full view and delegate to the corresponding PauliExpansionView method:

# These are equivalent:
result = expansion.apply_gate(gate)
result = expansion.view().apply_gate(gate)

Memory and buffer management#

Quantum operations (such as Pauli rotations) can increase the number of terms in an expansion (at most doubling it). Users must ensure output expansions have sufficient capacity. The API provides two approaches:

Automatic allocation: If expansion_out is not provided, a new expansion is allocated with sufficient capacity.
Pre-allocation: Users can pre-allocate buffers sized for worst-case growth.

The necessary sizes of the pre-allocated buffers can be obtained from the number of simulated qubits and the anticipated maximum number of terms:

from cuquantum.pauliprop.experimental import get_num_packed_integers

# Calculate buffer dimensions
ints_per_string = get_num_packed_integers(num_qubits)
max_terms = 100000

# Allocate buffers
xz_bits = cp.zeros((max_terms, 2 * ints_per_string), dtype=cp.uint64)
coeffs = cp.zeros((max_terms,), dtype=cp.complex128) # (using complex coefs)

Pauli string encoding#

Pauli strings must be encoded into packed integer arrays before being stored in a PauliExpansion. This section explains the encoding format and provides helper functions for converting between human-readable Pauli strings and their packed integer representation.

Encoding format

Each Pauli string is represented using a sequence of bits: “X bits” followed by corresponding “Z bits”. A single X bit and a single Z bit together encode a single Pauli operator (I, X, Y or Z) via the below mapping.

Pauli	X bit	Z bit
I (Identity)	0	0
X	1	0
Y	1	1
Z	0	1

The index of an X bit corresponds to the target qubit of the represented Pauli operator. The X bits for all qubits are packed together into 64-bit unsigned integers (uint64), and are followed by the similarly packed Z bits. For a system with num_qubits qubits:

The function get_num_packed_integers() returns the number of uint64 instances needed to store either the X bits (or equivalently, the Z bits) for one Pauli string.
The total storage for one Pauli string is 2 * get_num_packed_integers(num_qubits) integers: the first half stores X bits, the second half stores Z bits.

The encoding is little-endian: qubit 0 corresponds to bit 0 of the first uint64, qubit 63 to bit 63 of the first uint64, qubit 64 to bit 0 of the second uint64, and so on.

Example: 3-qubit system

For 3 qubits, get_num_packed_integers(3) returns 1, so each Pauli string is encoded via 2 uint64 values.

# Pauli string "XYZ" (X on qubit 0, Y on qubit 1, Z on qubit 2)
# Qubit 0: X -> X bit=1, Z bit=0
# Qubit 1: Y -> X bit=1, Z bit=1
# Qubit 2: Z -> X bit=0, Z bit=1

x_bits = [0b011]  # bits: qubit2=0, qubit1=1, qubit0=1
z_bits = [0b110]  # bits: qubit2=1, qubit1=1, qubit0=0

Example: 67-qubit system

For 67 qubits, get_num_packed_integers(67) returns 2, so each Pauli string is encoded via 4 uint64 values.

# Pauli string "Z0 X1 Y66" (Z on qubit 0, X on qubit 1, Y on qubit 66)
# Qubit 0:  Z -> X bit=0, Z bit=1
# Qubit 1:  X -> X bit=1, Z bit=0
# Qubit 66: Y -> X bit=1, Z bit=1

x_bits = [0b10, 0b100]
z_bits = [0b01, 0b100]

Helper functions

The following helper functions can be used to encode and decode Pauli strings.

import numpy as np
from cuquantum.pauliprop.experimental import get_num_packed_integers

def encode_pauli_string(pauli_string, num_qubits, xz_bits_array, term_index):
    """
    Encode a Pauli string into an XZ bits array.

    Args:
        pauli_string: String of Pauli operators (e.g., "IXYZ").
            The first character is qubit 0, second is qubit 1, etc.
        num_qubits: Number of qubits in the system.
        xz_bits_array: Output array of shape (num_terms, 2*xz_size) to write into.
        term_index: Index into xz_bits_array for this term.
    """
    assert len(pauli_string) <= num_qubits
    num_ints = (num_qubits + 63) // 64  # Number of uint64s needed

    for i, pauli in enumerate(pauli_string):
        int_idx = i // 64  # Which uint64
        bit_idx = i % 64   # Which bit within that uint64
        if pauli == 'X':
            xz_bits_array[term_index, int_idx] |= np.uint64(1) << np.uint64(bit_idx)
        elif pauli == 'Y':
            xz_bits_array[term_index, int_idx] |= np.uint64(1) << np.uint64(bit_idx)
            xz_bits_array[term_index, num_ints + int_idx] |= np.uint64(1) << np.uint64(bit_idx)
        elif pauli == 'Z':
            xz_bits_array[term_index, num_ints + int_idx] |= np.uint64(1) << np.uint64(bit_idx)


def decode_pauli_string(xz_bits_array, term_index, num_qubits):
    """
    Decode an XZ bits array entry into a Pauli string.

    Args:
        xz_bits_array: Input array of shape (num_terms, 2*xz_size) to read from.
        term_index: Index into xz_bits_array for this term.
        num_qubits: Number of qubits.

    Returns:
        String of Pauli operators (e.g., "IXYZ").
    """
    pauli_string = ""
    num_ints = (num_qubits + 63) // 64

    for i in range(num_qubits):
        int_idx = i // 64
        bit_idx = i % 64
        x_bit = (xz_bits_array[term_index, int_idx] >> bit_idx) & 1
        z_bit = (xz_bits_array[term_index, num_ints + int_idx] >> bit_idx) & 1
        if x_bit and z_bit:
            pauli_string += "Y"
        elif x_bit:
            pauli_string += "X"
        elif z_bit:
            pauli_string += "Z"
        else:
            pauli_string += "I"
    return pauli_string

Complete initialization example

Here is a complete example showing how to create a PauliExpansion with multiple Pauli strings:

import cupy as cp
from cuquantum.pauliprop.experimental import (
    LibraryHandle,
    PauliExpansion,
    get_num_packed_integers,
)

# System parameters
num_qubits = 4
pauli_terms = [
    ("ZIII", 1.0),      # Z on qubit 0
    ("IZII", 0.5),      # Z on qubit 1
    ("XXII", 0.25),     # XX on qubits 0,1
    ("YYYY", 0.1j),     # YYYY (complex coefficient)
]

# Calculate buffer sizes
xz_size = get_num_packed_integers(num_qubits)  # Returns 1 for num_qubits <= 64
num_terms = len(pauli_terms)
max_capacity = 1000  # Allow room for growth during gate application

# Allocate buffers on GPU
xz_bits = cp.zeros((max_capacity, 2 * xz_size), dtype=cp.uint64)
coeffs = cp.zeros((max_capacity,), dtype=cp.complex128) # complex for YYYY coef

# Encode each Pauli string into the buffer
xz_bits_host = cp.asnumpy(xz_bits)  # Work on CPU for encoding
for i, (pauli_str, coef) in enumerate(pauli_terms):
    encode_pauli_string(pauli_str, num_qubits, xz_bits_host, term_index=i)
    coeffs[i] = coef
xz_bits = cp.asarray(xz_bits_host)  # Transfer back to GPU

# Create the PauliExpansion
handle = LibraryHandle()
expansion = PauliExpansion(
    handle,
    num_qubits,
    num_terms=num_terms,
    xz_bits=xz_bits,
    coeffs=coeffs,
    sort_order="little_endian_bitwise",
    has_duplicates=False,
)

print(f"Created expansion with {expansion.num_terms} terms")

Stream semantics#

API functions that perform GPU operations accept a stream parameter. If not provided, the current stream of the array package (CuPy, PyTorch, etc.) is used. Operations are asynchronous by default when PauliExpansionOptions.blocking is False.

For blocking operations, set blocking=True in PauliExpansionOptions:

options = PauliExpansionOptions(blocking=True)
expansion = PauliExpansion(handle, num_qubits, num_terms, xz, coefs, options=options)

Sort order#

The terms within a Pauli expansion can be arbitrarily ordered, or otherwise sorted, with consequences for performance and term access. The order can be declared or requested using the sort_order parameter, as accepted by several API functions. Possible values of sort_order are:

None (or SortOrder.NONE): The expansion is unsorted, with terms in arbitrary order.
"internal" (or SortOrder.INTERNAL): The library may choose any implementation-defined sort order optimal for performance. This is recommended when sorting is desired for performance benefits, but a specific ordering is not required. The actual order is not guaranteed stable across library versions.

Note

"internal" cannot be used when creating a PauliExpansion directly, since the user cannot provide data in an implementation-defined order.
"little_endian_bitwise" (or SortOrder.LITTLE_ENDIAN_BITWISE): A well-defined canonical ordering based on the (X, Z) bit representation of Pauli strings. Use this when a reproducible, user-inspectable ordering is required.

When querying the sort order via PauliExpansion.sort_order, the returned value may be stronger than what was originally requested. For example, if None was requested during an operation, the result may report None, "internal", or "little_endian_bitwise".

Basic Workflow#

A typical workflow in the Heisenberg picture, whereby an observable is evolved through the adjoint circuit, involves:

Creating a LibraryHandle to manage library resources.
Constructing a PauliExpansion representing the initial observable.
Applying quantum operators to propagate the Pauli expansion through the adjoint circuit.
Computing traces or expectation values.

import cupy as cp
from cuquantum.pauliprop.experimental import (
    LibraryHandle,
    PauliExpansion,
    PauliExpansionOptions,
    PauliRotationGate,
    Truncation,
    get_num_packed_integers,
)

# Step 1: Create library handle
handle = LibraryHandle()

# Step 2: Prepare buffers for the initial observable
num_qubits = 4
ints_per_string = get_num_packed_integers(num_qubits)

xz_bits = cp.zeros((1000, 2 * ints_per_string), dtype=cp.uint64)
coeffs = cp.zeros((1000,), dtype=cp.float64)

# Encode Z_0 observable
xz_bits[0, ints_per_string] = 1  # Set Z bit for qubit 0
coeffs[0] = 1.0

# Step 3: Create a PauliExpansion
options = PauliExpansionOptions(blocking=True)
expansion = PauliExpansion(
    handle,
    num_qubits,
    num_terms=1,
    xz_bits=xz_bits,
    coeffs=coeffs,
    sort_order="little_endian_bitwise",
    options=options,
)

# Step 4: Apply gates (back-propagation through adjoint circuit)
gate = PauliRotationGate(angle=0.1, pauli_string="X", qubit_indices=(0,))
expansion = expansion.apply_gate(gate, adjoint=True)

# Step 5: Compute expectation value with zero state
result = expansion.trace_with_zero_state()
print(f"Expectation value: {result}")

Quantum Operators#

The API supports several types of quantum operators that can be applied to Pauli expansions.

Pauli rotation gates#

PauliRotationGate represents a rotation gate exp(-i * angle/2 * P) where P is a Pauli string. These gates can at most double the number of terms in an expansion.

from cuquantum.pauliprop.experimental import PauliRotationGate

# Single-qubit X rotation
rx = PauliRotationGate(angle=0.5, pauli_string="X", qubit_indices=(0,))

# Two-qubit ZZ rotation
rzz = PauliRotationGate(angle=-np.pi/2, pauli_string="ZZ", qubit_indices=(0, 1))

# Multi-qubit Pauli rotation
rp = PauliRotationGate(angle=0.1, pauli_string=["X", "Y", "Z"], qubit_indices=(0, 1, 2))

Clifford gates#

CliffordGate represents Clifford gates which map Pauli strings one-to-one and do not change the expansion size. Supported gates include: I, X, Y, Z, H, S, CX, CY, CZ, SWAP, iSWAP, SQRTX, SQRTY, SQRTZ.

from cuquantum.pauliprop.experimental import CliffordGate

# Hadamard gate on qubit 0
h = CliffordGate("H", qubit_indices=(0,))

# CNOT gate with control on qubit 0 and target on qubit 1
cx = CliffordGate("CX", qubit_indices=(0, 1))

# SWAP gate
swap = CliffordGate("SWAP", qubit_indices=(0, 1))

Pauli noise channels#

PauliNoiseChannel represents inhomogeneous Pauli noise channels upon 1 or 2 qubits. These channels affect only the coefficients of the Pauli expansion and do not change the number of terms.

from cuquantum.pauliprop.experimental import PauliNoiseChannel

# Single-qubit depolarizing channel
depol = PauliNoiseChannel(
    qubit_indices=(0,),
    noise_probabilities={"I": 0.99, "X": 0.01/3, "Y": 0.01/3, "Z": 0.01/3}
)

# Two-qubit correlated noise
two_qubit_noise = PauliNoiseChannel(
    qubit_indices=(0, 1),
    noise_probabilities={"II": 0.98, "XX": 0.01, "ZZ": 0.01}
)

Amplitude damping channels#

AmplitudeDampingChannel represents generalised amplitude damping channels. Like Pauli rotations, these can at most double the expansion size.

from cuquantum.pauliprop.experimental import AmplitudeDampingChannel

# Standard amplitude damping (dissipation to ground state)
damping = AmplitudeDampingChannel(
    damping_probability=0.1,
    excitation_probability=0.0,
    qubit_index=0
)

# Generalised amplitude damping
gad = AmplitudeDampingChannel(
    damping_probability=0.1,
    excitation_probability=0.05,
    qubit_index=0
)

Truncation#

During Pauli propagation, the number of terms in an expansion can grow exponentially. The Truncation dataclass allows specifying truncation strategies to manage this growth by discarding terms which satisfy some criterion. Multiple truncation criteria can be combined; terms are removed if they fail any criterion.

from cuquantum.pauliprop.experimental import Truncation

# Coefficient-based truncation: remove terms with |coef| < cutoff
truncation = Truncation(pauli_coeff_cutoff=1e-10)

# Weight-based truncation: remove terms with Pauli weight > cutoff
truncation = Truncation(pauli_weight_cutoff=10)

# Combined truncation strategies
truncation = Truncation(pauli_coeff_cutoff=1e-10, pauli_weight_cutoff=10)

Truncation can be applied during gate application or as a standalone operation:

# Apply truncation during gate application
expansion = expansion.apply_gate(gate, truncation=truncation)

# Apply truncation as standalone operation
expansion = expansion.truncate(truncation=truncation)

Computing Traces#

The final output of a Pauli propagation simulation is typically a scalar produced via a trace operation.

Trace with zero state#

The most common operation computes the expectation value with respect to the computational zero state:

# Computes <0|rho|0> where rho is the Pauli expansion
expectation = expansion.trace_with_zero_state()

Product trace#

For more general quantities, compute the trace of the product of two expansions:

# Computes Tr(A * B)
trace = expansion_a.product_trace(expansion_b)

# Computes Tr(A^† * B)
trace = expansion_a.product_trace(expansion_b, adjoint=True)

Rehearsal Mode#

The API supports an optional “rehearsal” mode that allows querying workspace requirements and the maximum number of terms before performing actual computations. Rehearsal is not required for typical usage—the API will automatically allocate resources as needed. However, rehearsal can be beneficial for advanced use cases such as:

Pre-allocating resources to avoid repeated memory allocations during long simulations
Using custom memory allocators to reduce memory pool fragmentation
Determining upfront if a simulation will fit in available GPU memory

Creating a rehearsal expansion#

Create a rehearsal-only expansion using PauliExpansion.empty():

from cuquantum.pauliprop.experimental import PauliExpansion, RehearsalInfo

# Create a rehearsal expansion (uses minimal internal buffers)
rehearsal = PauliExpansion.empty(
    handle,
    num_qubits=127,
    num_terms=250000,  # Expected maximum terms
    dtype="float64",
)

Rehearsing operations#

Rehearse operations to get resource requirements:

# Rehearse gate applications
max_info = None
for gate in reversed(circuit):
    info = rehearsal.apply_gate(gate, adjoint=True, rehearse=True)
    max_info = info if max_info is None else max_info | info

# Rehearse trace operation
trace_info = rehearsal.trace_with_zero_state(rehearse=True)
max_info = max_info | trace_info

print(f"Max terms required: {max_info.num_terms_required}")
print(f"Device workspace: {max_info.device_scratch_workspace_required} bytes")

The RehearsalInfo and GateApplicationRehearsalInfo objects support the | operator to combine requirements by taking the maximum of each field.

Converting to real expansion#

After rehearsal, create a real expansion using PauliExpansion.from_empty():

# Allocate buffers based on rehearsed requirements
capacity = max_info.num_terms_required
xz = cp.zeros((capacity, 2 * ints_per_string), dtype=cp.uint64)
coef = cp.zeros((capacity,), dtype=cp.float64)

# Initialize with observable data
xz[0] = encode_pauli_string(...)
coef[0] = 1.0

# Create real expansion from rehearsal
expansion = rehearsal.from_empty(xz, coef, num_terms=1)

External Memory Management#

For advanced use cases, you can provide a custom memory allocator to control how workspace memory is allocated. This is particularly useful when:

Running multiple simulations in sequence to avoid memory pool fragmentation
Pre-allocating a fixed workspace buffer
Integrating with external memory management systems

The allocator must implement the nvmath.memory.BaseCUDAMemoryManagerAsync protocol:

from cuquantum import MemoryPointer

class PreallocatedAllocator:
    def __init__(self, device_id: int, total_size: int):
        self.device_id = device_id
        with cp.cuda.Device(device_id):
            self._buffer = cp.empty(total_size, dtype=cp.uint8)
        self._ptr = self._buffer.data.ptr
        self.total_size = total_size

    def memalloc_async(self, size: int, stream) -> MemoryPointer:
        if size > self.total_size:
            raise MemoryError(f"Requested {size} bytes, only {self.total_size} available")
        return MemoryPointer(self._ptr, size, finalizer=None)

# Use the allocator
allocator = PreallocatedAllocator(device_id=0, total_size=workspace_bytes)
options = PauliExpansionOptions(allocator=allocator, memory_limit=workspace_bytes)
expansion = PauliExpansion(handle, ..., options=options)

Logging#

The API supports Python’s standard logging framework for debugging and performance monitoring. Pass a logger to LibraryHandle to enable logging:

import logging

# Configure logging
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
logger = logging.getLogger("pauliprop")
logger.setLevel(logging.DEBUG)

# Create handle with logger
handle = LibraryHandle(logger=logger)

The logger will output information about:

Library handle and expansion creation
Gate application timing and term counts
Workspace allocation
Trace computation results

Usage Examples#

Kicked Ising model#

The following example demonstrates a complete Pauli propagation workflow using the kicked Ising model on a 127-qubit heavy-hex topology.

First, we set up imports and circuit topology data:

from typing import Sequence

import cupy as cp
import numpy as np

from cuquantum.pauliprop.experimental import (
    LibraryHandle,
    PauliExpansion,
    PauliExpansionOptions,
    PauliRotationGate,
    Truncation,
    get_num_packed_integers,
)

# Circuit constants (IBM heavy-hex kicked Ising)
NUM_CIRCUIT_QUBITS = 127
NUM_ROTATIONS_PER_LAYER = 48
PI = np.pi
ZZ_ROTATION_ANGLE = -PI / 2.0

ZZ_QUBITS_RED = np.array([
    [  2,   1],  [ 33,  39], [ 59,  60], [ 66,  67], [ 72,  81], [118, 119],
    [ 21,  20],  [ 26,  25], [ 13,  12], [ 31,  32], [ 70,  74], [122, 123],
    [ 96,  97],  [ 57,  56], [ 63,  64], [107, 108], [103, 104], [ 46,  45],
    [ 28,  35],  [  7,   6], [ 79,  78], [  5,   4], [109, 114], [ 62,  61],
    [ 58,  71],  [ 37,  52], [ 76,  77], [  0,  14], [ 36,  51], [106, 105],
    [ 73,  85],  [ 88,  87], [ 68,  55], [116, 115], [ 94,  95], [100, 110],
    [ 17,  30],  [ 92, 102], [ 50,  49], [ 83,  84], [ 48,  47], [ 98,  99],
    [  8,   9],  [121, 120], [ 23,  24], [ 44,  43], [ 22,  15], [ 53,  41]
], dtype=np.int32)

ZZ_QUBITS_BLUE = np.array([
    [ 53,  60], [123, 124], [ 21,  22], [ 11,  12], [ 67,  68], [  2,   3],
    [ 66,  65], [122, 121], [110, 118], [  6,   5], [ 94,  90], [ 28,  29],
    [ 14,  18], [ 63,  62], [111, 104], [100,  99], [ 45,  44], [  4,  15],
    [ 20,  19], [ 57,  58], [ 77,  71], [ 76,  75], [ 26,  27], [ 16,   8],
    [ 35,  47], [ 31,  30], [ 48,  49], [ 69,  70], [125, 126], [ 89,  74],
    [ 80,  79], [116, 117], [114, 113], [ 10,   9], [106,  93], [101, 102],
    [ 92,  83], [ 98,  91], [ 82,  81], [ 54,  64], [ 96, 109], [ 85,  84],
    [ 87,  86], [108, 112], [ 34,  24], [ 42,  43], [ 40,  41], [ 39,  38]
], dtype=np.int32)

ZZ_QUBITS_GREEN = np.array([
    [ 10,  11], [ 54,  45], [111, 122], [ 64,  65], [ 60,  61], [103, 102],
    [ 72,  62], [  4,   3], [ 33,  20], [ 58,  59], [ 26,  16], [ 28,  27],
    [  8,   7], [104, 105], [ 73,  66], [ 87,  93], [ 85,  86], [ 55,  49],
    [ 68,  69], [ 89,  88], [ 80,  81], [117, 118], [101, 100], [114, 115],
    [ 96,  95], [ 29,  30], [106, 107], [ 83,  82], [ 91,  79], [  0,   1],
    [ 56,  52], [ 90,  75], [126, 112], [ 36,  32], [ 46,  47], [ 77,  78],
    [ 97,  98], [ 17,  12], [119, 120], [ 22,  23], [ 24,  25], [ 43,  34],
    [ 42,  41], [ 40,  39], [ 37,  38], [125, 124], [ 50,  51], [ 18,  19]
], dtype=np.int32)


def get_pauli_string_as_packed_integers(paulis: Sequence[str], qubits: Sequence[int], num_qubits: int) -> np.ndarray:
    num_packed_ints = get_num_packed_integers(num_qubits)
    out = np.zeros(num_packed_ints * 2, dtype=np.uint64)
    x_ptr = out[:num_packed_ints]
    z_ptr = out[num_packed_ints:]
    for pauli, qubit in zip(paulis, qubits):
        int_ind = qubit // 64
        bit_ind = qubit % 64
        if pauli in ("X", "Y"):
            x_ptr[int_ind] |= 1 << bit_ind
        if pauli in ("Z", "Y"):
            z_ptr[int_ind] |= 1 << bit_ind
    return out


def get_x_rotation_layer(angle: float) -> list[PauliRotationGate]:
    return [PauliRotationGate(angle, ["X"], [i]) for i in range(NUM_CIRCUIT_QUBITS)]


def get_zz_rotation_layer(topology: np.ndarray) -> list[PauliRotationGate]:
    return [PauliRotationGate(ZZ_ROTATION_ANGLE, ["Z", "Z"], pair.tolist()) for pair in topology]


def get_ibm_heavy_hex_ising_circuit(x_rotation_angle: float, num_trotter_steps: int) -> list[PauliRotationGate]:
    circuit: list[PauliRotationGate] = []
    for _ in range(num_trotter_steps):
        circuit.extend(get_x_rotation_layer(x_rotation_angle))
        circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_RED))
        circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_BLUE))
        circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_GREEN))
    return circuit

Then in the main function, we create the library handle and observable:

    handle = LibraryHandle()

    # Step 2: Create the initial observable (Z_62 with coefficient 1.0).
    num_packed = get_num_packed_integers(NUM_CIRCUIT_QUBITS)
    xz = cp.zeros((1, 2 * num_packed), dtype=cp.uint64)
    coefs = cp.zeros((1,), dtype=cp.float64)
    xz[0] = cp.asarray(get_pauli_string_as_packed_integers(["Z"], [62], NUM_CIRCUIT_QUBITS))
    coefs[0] = 1.0

    # Step 3: Create a PauliExpansion with the observable.
    options = PauliExpansionOptions(memory_limit="80%", blocking=True)
    expansion = PauliExpansion(
        handle,
        NUM_CIRCUIT_QUBITS,
        1,
        xz,
        coefs,
        options=options,
    )

Next, we define truncation strategy, build the circuit, and propagate:

    truncation = Truncation(pauli_coeff_cutoff=1e-4, pauli_weight_cutoff=8)
    num_gates_between_truncations = 10

    # Step 5: Build the quantum circuit (kicked Ising model).
    x_rotation_angle = PI / 4.0
    num_trotter_steps = 20
    circuit = get_ibm_heavy_hex_ising_circuit(x_rotation_angle, num_trotter_steps)

    print(f"Circuit: 127-qubit IBM heavy-hex Ising")
    print(f"  Trotter steps: {num_trotter_steps}")
    print(f"  Total gates:   {len(circuit)}")
    print(f"  Rx angle:      {x_rotation_angle} (pi/4)")
    print()

    start_event = cp.cuda.Event()
    end_event = cp.cuda.Event()
    start_event.record()
    max_num_terms = 0

    # Step 6: Back-propagate through the adjoint circuit.
    for gate_index in range(len(circuit) - 1, -1, -1):
        gate = circuit[gate_index]

        # Apply truncation periodically to control term growth.
        active_truncation = truncation if (gate_index % num_gates_between_truncations == 0) else None
        expansion = expansion.apply_gate(
            gate,
            truncation=active_truncation,
            adjoint=True,
            sort_order=None,
            keep_duplicates=False,
        )

        # Track maximum output term count after each gate.
        max_num_terms = max(max_num_terms, expansion.num_terms)

    # Step 7: Compute the expectation value <Z_62> with respect to |0...0>.
    expec = expansion.trace_with_zero_state()

    end_event.record()
    end_event.synchronize()
    duration = cp.cuda.get_elapsed_time(start_event, end_event) / 1000.0  # seconds

Finally, we print the results:

    print(f"Expectation value:       {expec}")
    print(f"Final number of terms:   {expansion.num_terms}")
    print(f"Maximum number of terms: {max_num_terms}")
    print(f"Runtime (s):             {duration}")
    print()


if __name__ == "__main__":
    main()

The full example is available at kicked_ising_example.py.

Kicked Ising model with rehearsal#

The following example extends the kicked Ising model to demonstrate the rehearsal feature. Rehearsal allows querying workspace requirements and maximum term counts before performing actual computations, which is useful for pre-allocating resources in memory-constrained environments.

The rehearsal phase estimates resource requirements:

    max_rehearsal_info = None
    for gate_index in range(len(circuit) - 1, -1, -1):
        gate = circuit[gate_index]
        active_truncation = truncation if (gate_index % num_gates_between_truncations == 0) else None
        rehearsal_info = expansion.apply_gate(
            gate,
            truncation=active_truncation,
            adjoint=True,
            keep_duplicates=False,
        )
        max_rehearsal_info = max_rehearsal_info | rehearsal_info if max_rehearsal_info is not None else rehearsal_info

    # Step 6: Rehearse the trace operation as well.
    trace_rehearsal_info = expansion.trace_with_zero_state(rehearse=True)
    max_rehearsal_info = max_rehearsal_info | trace_rehearsal_info

After rehearsal, we allocate buffers based on the estimated requirements:

    ints_per_string = get_num_packed_integers(NUM_CIRCUIT_QUBITS)
    capacity = max_rehearsal_info.num_terms_required
    xz_shape = (capacity, 2 * ints_per_string)
    coef_shape = (capacity,)

    in_xz = cp.zeros(xz_shape, dtype=cp.uint64, order="C")
    in_coef = cp.zeros(coef_shape, dtype=cp.float64)
    out_xz = cp.empty_like(in_xz)
    out_coef = cp.empty_like(in_coef)

    # Step 8: Encode the observable (Z_62) into the input buffer.
    in_xz[0] = cp.asarray(get_pauli_string_as_packed_integers(["Z"], [62], NUM_CIRCUIT_QUBITS))
    in_coef[0] = 1.0

    # Step 9: Create a preallocated workspace allocator based on rehearsed requirements.
    workspace_bytes = max_rehearsal_info.device_scratch_workspace_required
    workspace_allocator = (
        PreAllocatedWorkspaceAllocator(device_id=0, total_size=workspace_bytes)
        if workspace_bytes > 0 else None
    )
        
    options_sized = PauliExpansionOptions(
        allocator=workspace_allocator,
        memory_limit=workspace_bytes or "80%",
        blocking=True,
    )

    # Step 10: Create real expansions from buffers for ping-pong pattern.
    # Use sort_order="little_endian_bitwise" because a Pauli expansion with a single term is always sorted.
    in_expansion = expansion.from_empty(
        in_xz, in_coef,
        num_terms=1,
        sort_order="little_endian_bitwise",
        has_duplicates=False,
        options=options_sized,
    )

    out_expansion = PauliExpansion(
        handle,
        NUM_CIRCUIT_QUBITS,
        0,
        out_xz,
        out_coef,
        sort_order=None,
        has_duplicates=True,
        options=options_sized,
    )

The full example is available at kicked_ising_example_rehearsed.py.

Logging example#

The following example demonstrates how to enable and use logging with the Pauli propagation API.

Configure logging before importing pauliprop:

import logging
import cupy as cp

# Step 1: Configure logging BEFORE importing pauliprop.
# Use DEBUG for verbose output, INFO for milestone-only logging.
logging.basicConfig(
    level=logging.DEBUG,
    format='%(levelname)-8s %(message)s'
)

from cuquantum.pauliprop.experimental import (
    LibraryHandle,
    PauliExpansion,
    PauliRotationGate,
    get_num_packed_integers,
)

The main workflow with logging enabled:

    handle = LibraryHandle()

    # Step 3: Prepare data for a simple Pauli expansion.
    num_qubits = 4
    num_terms = 1
    ints_per_string = get_num_packed_integers(num_qubits)

    xz_bits = cp.zeros((num_terms, 2 * ints_per_string), dtype=cp.uint64)
    coeffs = cp.ones(num_terms, dtype=cp.complex128)

    # Step 4: Create a PauliExpansion (logs INFO on creation).
    expansion = PauliExpansion(handle, num_qubits, num_terms, xz_bits, coeffs)

    # Step 5: Apply a gate (logs INFO on start/completion, DEBUG for details).
    gate = PauliRotationGate(angle=0.5, pauli_string="X", qubit_indices=[0])
    result = expansion.apply_gate(gate, sort_order=None)

    # Step 6: Compute trace (logs INFO on start/completion).
    trace = result.trace_with_zero_state()
    print(f"\nTrace result: {trace}")

API Reference#

Handles#

LibraryHandle([device_id, logger])

Pauli Expansion#

`PauliExpansion`(library_handle, num_qubits, ...)	A Pauli operator expansion.
`PauliExpansionView`(pauli_expansion, ...[, ...])	A view on a contiguous range of Pauli terms inside a Pauli operator expansion.
`get_num_packed_integers`(int32_t num_qubits)	Returns the number of packed integers of cupaulipropPackedIntegerType_t needed to represent the X bits (or equivalently, the Z bits) of a single Pauli string.

Gates and Channels#

`CliffordGate`(name, qubit_indices)	A Clifford gate (I, X, Y, Z, H, S, CX, CY, CZ, SWAP, iSWAP, etc.).
`PauliRotationGate`(angle, pauli_string, ...)	A Pauli rotation gate exp(-i * angle/2 * P) where P is a Pauli string.
`PauliNoiseChannel`(qubit_indices, ...)	A Pauli noise channel acting on 1 or 2 qubits.
`AmplitudeDampingChannel`(damping_probability, ...)	An amplitude damping channel with damping and excitation probabilities.

Configuration#

`PauliExpansionOptions`([allocator, ...])	Options controlling Pauli expansion construction and execution.
`Truncation`([pauli_weight_cutoff, ...])	Truncation strategy for Pauli expansion operations.
`RehearsalInfo`(...)	Information about the required resources for a rehearsed operation.
`GateApplicationRehearsalInfo`(...)	Information about the required resources for a gate application rehearsal.

SortOrder(value)

See cupaulipropSortOrder_t.