Pauli Propagation APIs#
The experimental pythonic API in cuquantum. provides a high-level interface for
accelerating Pauli propagation simulations using the cuPauliProp library.
See also
For a general introduction to Pauli propagation, the Pauli basis representation, and the underlying algorithms, see the cuPauliProp C API Overview.
Note
All APIs in this module are experimental and may be subject to change in a future release. Kindly share your feedback with us on NVIDIA/cuQuantum GitHub Discussions!
Introduction#
Pauli propagation is a technique for simulating quantum circuits by tracking how Pauli strings (tensor products of Pauli operators) evolve under the action of quantum gates and channels. This approach can be particularly efficient for certain classes of circuits and observables, offering an alternative to full state vector simulation.
For a detailed introduction to Pauli propagation concepts, the Pauli basis representation, supported gates and truncation strategies, please refer to the cuPauliProp C API Overview.
The cuquantum. module provides:
LibraryHandle: Manages library resources and GPU device context.PauliExpansion: A container for representing quantum operators as a sum of Pauli strings with coefficients.PauliExpansionView: A non-owning view on a contiguous range of Pauli terms within aPauliExpansion.Quantum operator classes for applying operations to Pauli expansions:
PauliRotationGate: Pauli rotation gatesexp(-i * angle/2 * P).CliffordGate: Clifford gates (H, S, CX, CZ, SWAP, etc.).PauliNoiseChannel: Pauli noise channels for 1 or 2 qubits.AmplitudeDampingChannel: Generalised amplitude damping channels.
Truncation: Truncation strategies to manage the growth of terms during propagation.
API Design#
This section explains the key design concepts of the Pauli propagation Python API.
PauliExpansion, views, and out-of-place operations#
A PauliExpansion is the central data structure which represents a quantum operator, such as an observable or density matrix,
as a weighted sum of Pauli strings. It wraps user-provided GPU memory buffers for storing the Pauli strings (as packed integers, described
in a subsequent section) and their coefficients.
A PauliExpansionView is a lightweight, read-only handle to a contiguous subset of terms (each, a Pauli string and corresponding coefficient)
within an expansion. Views have two key properties:
Contiguous subset: A view represents a range
[start_index, end_index)of terms from its parent expansion. This enables advanced memory management patterns such as hybridising depth-first and breadth-first evaluation of Pauli paths.Read-only and out-of-place: Views do not modify the underlying expansion. All computational methods on views read from the view and write results to a separate output expansion. The only exception is
PauliExpansion.populate_from(), which copies data from another expansion (or view) into the target expansion’s buffers in-place.
To avoid repeated memory allocations, output expansions can be provided explicitly via the expansion_out
parameter. This enables a “ping-pong” pattern where two expansions alternate as input and output:
for gate in reversed(circuit):
in_expansion.apply_gate(gate, expansion_out=out_expansion, adjoint=True)
in_expansion, out_expansion = out_expansion, in_expansion # swap
If expansion_out is not provided, the API automatically allocates and returns a new expansion with sufficient capacity:
out_expansion = in_expansion.apply_gate(gate, adjoint=True)
Creating a view is straightforward:
# Create a view covering all terms
view = expansion.view()
# Create a view covering a subset of terms
partial_view = expansion.view(start_index=0, end_index=100)
For convenience, PauliExpansion provides wrapper methods that automatically create a full view
and delegate to the corresponding PauliExpansionView method:
# These are equivalent:
result = expansion.apply_gate(gate)
result = expansion.view().apply_gate(gate)
Memory and buffer management#
Quantum operations (such as Pauli rotations) can increase the number of terms in an expansion (at most doubling it). Users must ensure output expansions have sufficient capacity. The API provides two approaches:
Automatic allocation: If
expansion_outis not provided, a new expansion is allocated with sufficient capacity.Pre-allocation: Users can pre-allocate buffers sized for worst-case growth.
The necessary sizes of the pre-allocated buffers can be obtained from the number of simulated qubits and the anticipated maximum number of terms:
from cuquantum.pauliprop.experimental import get_num_packed_integers
# Calculate buffer dimensions
ints_per_string = get_num_packed_integers(num_qubits)
max_terms = 100000
# Allocate buffers
xz_bits = cp.zeros((max_terms, 2 * ints_per_string), dtype=cp.uint64)
coeffs = cp.zeros((max_terms,), dtype=cp.complex128) # (using complex coefs)
Pauli string encoding#
Pauli strings must be encoded into packed integer arrays before being stored in a PauliExpansion.
This section explains the encoding format and provides helper functions for converting between human-readable
Pauli strings and their packed integer representation.
Encoding format
Each Pauli string is represented using a sequence of bits: “X bits” followed by corresponding “Z bits”. A single X bit and a single Z bit together encode a single Pauli operator (I, X, Y or Z) via the below mapping.
Pauli |
X bit |
Z bit |
|---|---|---|
I (Identity) |
0 |
0 |
X |
1 |
0 |
Y |
1 |
1 |
Z |
0 |
1 |
The index of an X bit corresponds to the target qubit of the represented Pauli operator. The X bits for all qubits are
packed together into 64-bit unsigned integers (uint64), and are followed by the similarly packed Z bits.
For a system with num_qubits qubits:
The function
get_num_packed_integers()returns the number ofuint64instances needed to store either the X bits (or equivalently, the Z bits) for one Pauli string.The total storage for one Pauli string is
2 * get_num_packed_integers(num_qubits)integers: the first half stores X bits, the second half stores Z bits.
The encoding is little-endian: qubit 0 corresponds to bit 0 of the first uint64, qubit 63
to bit 63 of the first uint64, qubit 64 to bit 0 of the second uint64, and so on.
Example: 3-qubit system
For 3 qubits, get_num_packed_integers(3) returns 1, so each Pauli string is encoded via 2 uint64 values.
# Pauli string "XYZ" (X on qubit 0, Y on qubit 1, Z on qubit 2)
# Qubit 0: X -> X bit=1, Z bit=0
# Qubit 1: Y -> X bit=1, Z bit=1
# Qubit 2: Z -> X bit=0, Z bit=1
x_bits = [0b011] # bits: qubit2=0, qubit1=1, qubit0=1
z_bits = [0b110] # bits: qubit2=1, qubit1=1, qubit0=0
Example: 67-qubit system
For 67 qubits, get_num_packed_integers(67) returns 2, so each Pauli string is encoded via 4 uint64 values.
# Pauli string "Z0 X1 Y66" (Z on qubit 0, X on qubit 1, Y on qubit 66)
# Qubit 0: Z -> X bit=0, Z bit=1
# Qubit 1: X -> X bit=1, Z bit=0
# Qubit 66: Y -> X bit=1, Z bit=1
x_bits = [0b10, 0b100]
z_bits = [0b01, 0b100]
Helper functions
The following helper functions can be used to encode and decode Pauli strings.
import numpy as np
from cuquantum.pauliprop.experimental import get_num_packed_integers
def encode_pauli_string(pauli_string, num_qubits, xz_bits_array, term_index):
"""
Encode a Pauli string into an XZ bits array.
Args:
pauli_string: String of Pauli operators (e.g., "IXYZ").
The first character is qubit 0, second is qubit 1, etc.
num_qubits: Number of qubits in the system.
xz_bits_array: Output array of shape (num_terms, 2*xz_size) to write into.
term_index: Index into xz_bits_array for this term.
"""
assert len(pauli_string) <= num_qubits
num_ints = (num_qubits + 63) // 64 # Number of uint64s needed
for i, pauli in enumerate(pauli_string):
int_idx = i // 64 # Which uint64
bit_idx = i % 64 # Which bit within that uint64
if pauli == 'X':
xz_bits_array[term_index, int_idx] |= np.uint64(1) << np.uint64(bit_idx)
elif pauli == 'Y':
xz_bits_array[term_index, int_idx] |= np.uint64(1) << np.uint64(bit_idx)
xz_bits_array[term_index, num_ints + int_idx] |= np.uint64(1) << np.uint64(bit_idx)
elif pauli == 'Z':
xz_bits_array[term_index, num_ints + int_idx] |= np.uint64(1) << np.uint64(bit_idx)
def decode_pauli_string(xz_bits_array, term_index, num_qubits):
"""
Decode an XZ bits array entry into a Pauli string.
Args:
xz_bits_array: Input array of shape (num_terms, 2*xz_size) to read from.
term_index: Index into xz_bits_array for this term.
num_qubits: Number of qubits.
Returns:
String of Pauli operators (e.g., "IXYZ").
"""
pauli_string = ""
num_ints = (num_qubits + 63) // 64
for i in range(num_qubits):
int_idx = i // 64
bit_idx = i % 64
x_bit = (xz_bits_array[term_index, int_idx] >> bit_idx) & 1
z_bit = (xz_bits_array[term_index, num_ints + int_idx] >> bit_idx) & 1
if x_bit and z_bit:
pauli_string += "Y"
elif x_bit:
pauli_string += "X"
elif z_bit:
pauli_string += "Z"
else:
pauli_string += "I"
return pauli_string
Complete initialization example
Here is a complete example showing how to create a PauliExpansion with multiple Pauli strings:
import cupy as cp
from cuquantum.pauliprop.experimental import (
LibraryHandle,
PauliExpansion,
get_num_packed_integers,
)
# System parameters
num_qubits = 4
pauli_terms = [
("ZIII", 1.0), # Z on qubit 0
("IZII", 0.5), # Z on qubit 1
("XXII", 0.25), # XX on qubits 0,1
("YYYY", 0.1j), # YYYY (complex coefficient)
]
# Calculate buffer sizes
xz_size = get_num_packed_integers(num_qubits) # Returns 1 for num_qubits <= 64
num_terms = len(pauli_terms)
max_capacity = 1000 # Allow room for growth during gate application
# Allocate buffers on GPU
xz_bits = cp.zeros((max_capacity, 2 * xz_size), dtype=cp.uint64)
coeffs = cp.zeros((max_capacity,), dtype=cp.complex128) # complex for YYYY coef
# Encode each Pauli string into the buffer
xz_bits_host = cp.asnumpy(xz_bits) # Work on CPU for encoding
for i, (pauli_str, coef) in enumerate(pauli_terms):
encode_pauli_string(pauli_str, num_qubits, xz_bits_host, term_index=i)
coeffs[i] = coef
xz_bits = cp.asarray(xz_bits_host) # Transfer back to GPU
# Create the PauliExpansion
handle = LibraryHandle()
expansion = PauliExpansion(
handle,
num_qubits,
num_terms=num_terms,
xz_bits=xz_bits,
coeffs=coeffs,
sort_order="little_endian_bitwise",
has_duplicates=False,
)
print(f"Created expansion with {expansion.num_terms} terms")
Stream semantics#
API functions that perform GPU operations accept a stream parameter. If not provided, the current
stream of the array package (CuPy, PyTorch, etc.) is used. Operations are asynchronous by default
when PauliExpansionOptions.blocking is False.
For blocking operations, set blocking=True in PauliExpansionOptions:
options = PauliExpansionOptions(blocking=True)
expansion = PauliExpansion(handle, num_qubits, num_terms, xz, coefs, options=options)
Sort order#
The terms within a Pauli expansion can be arbitrarily ordered, or otherwise sorted, with consequences for performance
and term access. The order can be declared or requested using the sort_order parameter, as accepted by several API functions.
Possible values of sort_order are:
None(orSortOrder.NONE): The expansion is unsorted, with terms in arbitrary order."internal"(orSortOrder.INTERNAL): The library may choose any implementation-defined sort order optimal for performance. This is recommended when sorting is desired for performance benefits, but a specific ordering is not required. The actual order is not guaranteed stable across library versions.Note
"internal"cannot be used when creating aPauliExpansiondirectly, since the user cannot provide data in an implementation-defined order."little_endian_bitwise"(orSortOrder.LITTLE_ENDIAN_BITWISE): A well-defined canonical ordering based on the (X, Z) bit representation of Pauli strings. Use this when a reproducible, user-inspectable ordering is required.
When querying the sort order via PauliExpansion.sort_order, the returned value may be stronger
than what was originally requested. For example, if None was requested during an operation, the result
may report None, "internal", or "little_endian_bitwise".
Basic Workflow#
A typical workflow in the Heisenberg picture, whereby an observable is evolved through the adjoint circuit, involves:
Creating a
LibraryHandleto manage library resources.Constructing a
PauliExpansionrepresenting the initial observable.Applying quantum operators to propagate the Pauli expansion through the adjoint circuit.
Computing traces or expectation values.
import cupy as cp
from cuquantum.pauliprop.experimental import (
LibraryHandle,
PauliExpansion,
PauliExpansionOptions,
PauliRotationGate,
Truncation,
get_num_packed_integers,
)
# Step 1: Create library handle
handle = LibraryHandle()
# Step 2: Prepare buffers for the initial observable
num_qubits = 4
ints_per_string = get_num_packed_integers(num_qubits)
xz_bits = cp.zeros((1000, 2 * ints_per_string), dtype=cp.uint64)
coeffs = cp.zeros((1000,), dtype=cp.float64)
# Encode Z_0 observable
xz_bits[0, ints_per_string] = 1 # Set Z bit for qubit 0
coeffs[0] = 1.0
# Step 3: Create a PauliExpansion
options = PauliExpansionOptions(blocking=True)
expansion = PauliExpansion(
handle,
num_qubits,
num_terms=1,
xz_bits=xz_bits,
coeffs=coeffs,
sort_order="little_endian_bitwise",
options=options,
)
# Step 4: Apply gates (back-propagation through adjoint circuit)
gate = PauliRotationGate(angle=0.1, pauli_string="X", qubit_indices=(0,))
expansion = expansion.apply_gate(gate, adjoint=True)
# Step 5: Compute expectation value with zero state
result = expansion.trace_with_zero_state()
print(f"Expectation value: {result}")
Quantum Operators#
The API supports several types of quantum operators that can be applied to Pauli expansions.
Pauli rotation gates#
PauliRotationGate represents a rotation gate exp(-i * angle/2 * P) where P is a Pauli string.
These gates can at most double the number of terms in an expansion.
from cuquantum.pauliprop.experimental import PauliRotationGate
# Single-qubit X rotation
rx = PauliRotationGate(angle=0.5, pauli_string="X", qubit_indices=(0,))
# Two-qubit ZZ rotation
rzz = PauliRotationGate(angle=-np.pi/2, pauli_string="ZZ", qubit_indices=(0, 1))
# Multi-qubit Pauli rotation
rp = PauliRotationGate(angle=0.1, pauli_string=["X", "Y", "Z"], qubit_indices=(0, 1, 2))
Clifford gates#
CliffordGate represents Clifford gates which map Pauli strings one-to-one and do not change
the expansion size. Supported gates include: I, X, Y, Z, H, S, CX, CY, CZ, SWAP, iSWAP, SQRTX, SQRTY, SQRTZ.
from cuquantum.pauliprop.experimental import CliffordGate
# Hadamard gate on qubit 0
h = CliffordGate("H", qubit_indices=(0,))
# CNOT gate with control on qubit 0 and target on qubit 1
cx = CliffordGate("CX", qubit_indices=(0, 1))
# SWAP gate
swap = CliffordGate("SWAP", qubit_indices=(0, 1))
Pauli noise channels#
PauliNoiseChannel represents inhomogeneous Pauli noise channels upon 1 or 2 qubits. These channels
affect only the coefficients of the Pauli expansion and do not change the number of terms.
from cuquantum.pauliprop.experimental import PauliNoiseChannel
# Single-qubit depolarizing channel
depol = PauliNoiseChannel(
qubit_indices=(0,),
noise_probabilities={"I": 0.99, "X": 0.01/3, "Y": 0.01/3, "Z": 0.01/3}
)
# Two-qubit correlated noise
two_qubit_noise = PauliNoiseChannel(
qubit_indices=(0, 1),
noise_probabilities={"II": 0.98, "XX": 0.01, "ZZ": 0.01}
)
Amplitude damping channels#
AmplitudeDampingChannel represents generalised amplitude damping channels. Like Pauli rotations,
these can at most double the expansion size.
from cuquantum.pauliprop.experimental import AmplitudeDampingChannel
# Standard amplitude damping (dissipation to ground state)
damping = AmplitudeDampingChannel(
damping_probability=0.1,
excitation_probability=0.0,
qubit_index=0
)
# Generalised amplitude damping
gad = AmplitudeDampingChannel(
damping_probability=0.1,
excitation_probability=0.05,
qubit_index=0
)
Truncation#
During Pauli propagation, the number of terms in an expansion can grow exponentially. The Truncation dataclass
allows specifying truncation strategies to manage this growth by discarding terms which satisfy some criterion.
Multiple truncation criteria can be combined; terms are removed if they fail any criterion.
from cuquantum.pauliprop.experimental import Truncation
# Coefficient-based truncation: remove terms with |coef| < cutoff
truncation = Truncation(pauli_coeff_cutoff=1e-10)
# Weight-based truncation: remove terms with Pauli weight > cutoff
truncation = Truncation(pauli_weight_cutoff=10)
# Combined truncation strategies
truncation = Truncation(pauli_coeff_cutoff=1e-10, pauli_weight_cutoff=10)
Truncation can be applied during gate application or as a standalone operation:
# Apply truncation during gate application
expansion = expansion.apply_gate(gate, truncation=truncation)
# Apply truncation as standalone operation
expansion = expansion.truncate(truncation=truncation)
Computing Traces#
The final output of a Pauli propagation simulation is typically a scalar produced via a trace operation.
Trace with zero state#
The most common operation computes the expectation value with respect to the computational zero state:
# Computes <0|rho|0> where rho is the Pauli expansion
expectation = expansion.trace_with_zero_state()
Product trace#
For more general quantities, compute the trace of the product of two expansions:
# Computes Tr(A * B)
trace = expansion_a.product_trace(expansion_b)
# Computes Tr(A^† * B)
trace = expansion_a.product_trace(expansion_b, adjoint=True)
Rehearsal Mode#
The API supports an optional “rehearsal” mode that allows querying workspace requirements and the maximum number of terms before performing actual computations. Rehearsal is not required for typical usage—the API will automatically allocate resources as needed. However, rehearsal can be beneficial for advanced use cases such as:
Pre-allocating resources to avoid repeated memory allocations during long simulations
Using custom memory allocators to reduce memory pool fragmentation
Determining upfront if a simulation will fit in available GPU memory
Creating a rehearsal expansion#
Create a rehearsal-only expansion using PauliExpansion.empty():
from cuquantum.pauliprop.experimental import PauliExpansion, RehearsalInfo
# Create a rehearsal expansion (uses minimal internal buffers)
rehearsal = PauliExpansion.empty(
handle,
num_qubits=127,
num_terms=250000, # Expected maximum terms
dtype="float64",
)
Rehearsing operations#
Rehearse operations to get resource requirements:
# Rehearse gate applications
max_info = None
for gate in reversed(circuit):
info = rehearsal.apply_gate(gate, adjoint=True, rehearse=True)
max_info = info if max_info is None else max_info | info
# Rehearse trace operation
trace_info = rehearsal.trace_with_zero_state(rehearse=True)
max_info = max_info | trace_info
print(f"Max terms required: {max_info.num_terms_required}")
print(f"Device workspace: {max_info.device_scratch_workspace_required} bytes")
The RehearsalInfo and GateApplicationRehearsalInfo objects support the | operator
to combine requirements by taking the maximum of each field.
Converting to real expansion#
After rehearsal, create a real expansion using PauliExpansion.from_empty():
# Allocate buffers based on rehearsed requirements
capacity = max_info.num_terms_required
xz = cp.zeros((capacity, 2 * ints_per_string), dtype=cp.uint64)
coef = cp.zeros((capacity,), dtype=cp.float64)
# Initialize with observable data
xz[0] = encode_pauli_string(...)
coef[0] = 1.0
# Create real expansion from rehearsal
expansion = rehearsal.from_empty(xz, coef, num_terms=1)
External Memory Management#
For advanced use cases, you can provide a custom memory allocator to control how workspace memory is allocated. This is particularly useful when:
Running multiple simulations in sequence to avoid memory pool fragmentation
Pre-allocating a fixed workspace buffer
Integrating with external memory management systems
The allocator must implement the nvmath.memory.BaseCUDAMemoryManagerAsync protocol:
from cuquantum import MemoryPointer
class PreallocatedAllocator:
def __init__(self, device_id: int, total_size: int):
self.device_id = device_id
with cp.cuda.Device(device_id):
self._buffer = cp.empty(total_size, dtype=cp.uint8)
self._ptr = self._buffer.data.ptr
self.total_size = total_size
def memalloc_async(self, size: int, stream) -> MemoryPointer:
if size > self.total_size:
raise MemoryError(f"Requested {size} bytes, only {self.total_size} available")
return MemoryPointer(self._ptr, size, finalizer=None)
# Use the allocator
allocator = PreallocatedAllocator(device_id=0, total_size=workspace_bytes)
options = PauliExpansionOptions(allocator=allocator, memory_limit=workspace_bytes)
expansion = PauliExpansion(handle, ..., options=options)
Logging#
The API supports Python’s standard logging framework for debugging and performance monitoring.
Pass a logger to LibraryHandle to enable logging:
import logging
# Configure logging
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
logger = logging.getLogger("pauliprop")
logger.setLevel(logging.DEBUG)
# Create handle with logger
handle = LibraryHandle(logger=logger)
The logger will output information about:
Library handle and expansion creation
Gate application timing and term counts
Workspace allocation
Trace computation results
Usage Examples#
Kicked Ising model#
The following example demonstrates a complete Pauli propagation workflow using the kicked Ising model on a 127-qubit heavy-hex topology.
First, we set up imports and circuit topology data:
from typing import Sequence
import cupy as cp
import numpy as np
from cuquantum.pauliprop.experimental import (
LibraryHandle,
PauliExpansion,
PauliExpansionOptions,
PauliRotationGate,
Truncation,
get_num_packed_integers,
)
# Circuit constants (IBM heavy-hex kicked Ising)
NUM_CIRCUIT_QUBITS = 127
NUM_ROTATIONS_PER_LAYER = 48
PI = np.pi
ZZ_ROTATION_ANGLE = -PI / 2.0
ZZ_QUBITS_RED = np.array([
[ 2, 1], [ 33, 39], [ 59, 60], [ 66, 67], [ 72, 81], [118, 119],
[ 21, 20], [ 26, 25], [ 13, 12], [ 31, 32], [ 70, 74], [122, 123],
[ 96, 97], [ 57, 56], [ 63, 64], [107, 108], [103, 104], [ 46, 45],
[ 28, 35], [ 7, 6], [ 79, 78], [ 5, 4], [109, 114], [ 62, 61],
[ 58, 71], [ 37, 52], [ 76, 77], [ 0, 14], [ 36, 51], [106, 105],
[ 73, 85], [ 88, 87], [ 68, 55], [116, 115], [ 94, 95], [100, 110],
[ 17, 30], [ 92, 102], [ 50, 49], [ 83, 84], [ 48, 47], [ 98, 99],
[ 8, 9], [121, 120], [ 23, 24], [ 44, 43], [ 22, 15], [ 53, 41]
], dtype=np.int32)
ZZ_QUBITS_BLUE = np.array([
[ 53, 60], [123, 124], [ 21, 22], [ 11, 12], [ 67, 68], [ 2, 3],
[ 66, 65], [122, 121], [110, 118], [ 6, 5], [ 94, 90], [ 28, 29],
[ 14, 18], [ 63, 62], [111, 104], [100, 99], [ 45, 44], [ 4, 15],
[ 20, 19], [ 57, 58], [ 77, 71], [ 76, 75], [ 26, 27], [ 16, 8],
[ 35, 47], [ 31, 30], [ 48, 49], [ 69, 70], [125, 126], [ 89, 74],
[ 80, 79], [116, 117], [114, 113], [ 10, 9], [106, 93], [101, 102],
[ 92, 83], [ 98, 91], [ 82, 81], [ 54, 64], [ 96, 109], [ 85, 84],
[ 87, 86], [108, 112], [ 34, 24], [ 42, 43], [ 40, 41], [ 39, 38]
], dtype=np.int32)
ZZ_QUBITS_GREEN = np.array([
[ 10, 11], [ 54, 45], [111, 122], [ 64, 65], [ 60, 61], [103, 102],
[ 72, 62], [ 4, 3], [ 33, 20], [ 58, 59], [ 26, 16], [ 28, 27],
[ 8, 7], [104, 105], [ 73, 66], [ 87, 93], [ 85, 86], [ 55, 49],
[ 68, 69], [ 89, 88], [ 80, 81], [117, 118], [101, 100], [114, 115],
[ 96, 95], [ 29, 30], [106, 107], [ 83, 82], [ 91, 79], [ 0, 1],
[ 56, 52], [ 90, 75], [126, 112], [ 36, 32], [ 46, 47], [ 77, 78],
[ 97, 98], [ 17, 12], [119, 120], [ 22, 23], [ 24, 25], [ 43, 34],
[ 42, 41], [ 40, 39], [ 37, 38], [125, 124], [ 50, 51], [ 18, 19]
], dtype=np.int32)
def get_pauli_string_as_packed_integers(paulis: Sequence[str], qubits: Sequence[int], num_qubits: int) -> np.ndarray:
num_packed_ints = get_num_packed_integers(num_qubits)
out = np.zeros(num_packed_ints * 2, dtype=np.uint64)
x_ptr = out[:num_packed_ints]
z_ptr = out[num_packed_ints:]
for pauli, qubit in zip(paulis, qubits):
int_ind = qubit // 64
bit_ind = qubit % 64
if pauli in ("X", "Y"):
x_ptr[int_ind] |= 1 << bit_ind
if pauli in ("Z", "Y"):
z_ptr[int_ind] |= 1 << bit_ind
return out
def get_x_rotation_layer(angle: float) -> list[PauliRotationGate]:
return [PauliRotationGate(angle, ["X"], [i]) for i in range(NUM_CIRCUIT_QUBITS)]
def get_zz_rotation_layer(topology: np.ndarray) -> list[PauliRotationGate]:
return [PauliRotationGate(ZZ_ROTATION_ANGLE, ["Z", "Z"], pair.tolist()) for pair in topology]
def get_ibm_heavy_hex_ising_circuit(x_rotation_angle: float, num_trotter_steps: int) -> list[PauliRotationGate]:
circuit: list[PauliRotationGate] = []
for _ in range(num_trotter_steps):
circuit.extend(get_x_rotation_layer(x_rotation_angle))
circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_RED))
circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_BLUE))
circuit.extend(get_zz_rotation_layer(ZZ_QUBITS_GREEN))
return circuit
Then in the main function, we create the library handle and observable:
handle = LibraryHandle()
# Step 2: Create the initial observable (Z_62 with coefficient 1.0).
num_packed = get_num_packed_integers(NUM_CIRCUIT_QUBITS)
xz = cp.zeros((1, 2 * num_packed), dtype=cp.uint64)
coefs = cp.zeros((1,), dtype=cp.float64)
xz[0] = cp.asarray(get_pauli_string_as_packed_integers(["Z"], [62], NUM_CIRCUIT_QUBITS))
coefs[0] = 1.0
# Step 3: Create a PauliExpansion with the observable.
options = PauliExpansionOptions(memory_limit="80%", blocking=True)
expansion = PauliExpansion(
handle,
NUM_CIRCUIT_QUBITS,
1,
xz,
coefs,
options=options,
)
Next, we define truncation strategy, build the circuit, and propagate:
truncation = Truncation(pauli_coeff_cutoff=1e-4, pauli_weight_cutoff=8)
num_gates_between_truncations = 10
# Step 5: Build the quantum circuit (kicked Ising model).
x_rotation_angle = PI / 4.0
num_trotter_steps = 20
circuit = get_ibm_heavy_hex_ising_circuit(x_rotation_angle, num_trotter_steps)
print(f"Circuit: 127-qubit IBM heavy-hex Ising")
print(f" Trotter steps: {num_trotter_steps}")
print(f" Total gates: {len(circuit)}")
print(f" Rx angle: {x_rotation_angle} (pi/4)")
print()
start_event = cp.cuda.Event()
end_event = cp.cuda.Event()
start_event.record()
max_num_terms = 0
# Step 6: Back-propagate through the adjoint circuit.
for gate_index in range(len(circuit) - 1, -1, -1):
gate = circuit[gate_index]
# Apply truncation periodically to control term growth.
active_truncation = truncation if (gate_index % num_gates_between_truncations == 0) else None
expansion = expansion.apply_gate(
gate,
truncation=active_truncation,
adjoint=True,
sort_order=None,
keep_duplicates=False,
)
# Track maximum output term count after each gate.
max_num_terms = max(max_num_terms, expansion.num_terms)
# Step 7: Compute the expectation value <Z_62> with respect to |0...0>.
expec = expansion.trace_with_zero_state()
end_event.record()
end_event.synchronize()
duration = cp.cuda.get_elapsed_time(start_event, end_event) / 1000.0 # seconds
Finally, we print the results:
print(f"Expectation value: {expec}")
print(f"Final number of terms: {expansion.num_terms}")
print(f"Maximum number of terms: {max_num_terms}")
print(f"Runtime (s): {duration}")
print()
if __name__ == "__main__":
main()
The full example is available at kicked_ising_example.py.
Kicked Ising model with rehearsal#
The following example extends the kicked Ising model to demonstrate the rehearsal feature. Rehearsal allows querying workspace requirements and maximum term counts before performing actual computations, which is useful for pre-allocating resources in memory-constrained environments.
The rehearsal phase estimates resource requirements:
max_rehearsal_info = None
for gate_index in range(len(circuit) - 1, -1, -1):
gate = circuit[gate_index]
active_truncation = truncation if (gate_index % num_gates_between_truncations == 0) else None
rehearsal_info = expansion.apply_gate(
gate,
truncation=active_truncation,
adjoint=True,
keep_duplicates=False,
)
max_rehearsal_info = max_rehearsal_info | rehearsal_info if max_rehearsal_info is not None else rehearsal_info
# Step 6: Rehearse the trace operation as well.
trace_rehearsal_info = expansion.trace_with_zero_state(rehearse=True)
max_rehearsal_info = max_rehearsal_info | trace_rehearsal_info
After rehearsal, we allocate buffers based on the estimated requirements:
ints_per_string = get_num_packed_integers(NUM_CIRCUIT_QUBITS)
capacity = max_rehearsal_info.num_terms_required
xz_shape = (capacity, 2 * ints_per_string)
coef_shape = (capacity,)
in_xz = cp.zeros(xz_shape, dtype=cp.uint64, order="C")
in_coef = cp.zeros(coef_shape, dtype=cp.float64)
out_xz = cp.empty_like(in_xz)
out_coef = cp.empty_like(in_coef)
# Step 8: Encode the observable (Z_62) into the input buffer.
in_xz[0] = cp.asarray(get_pauli_string_as_packed_integers(["Z"], [62], NUM_CIRCUIT_QUBITS))
in_coef[0] = 1.0
# Step 9: Create a preallocated workspace allocator based on rehearsed requirements.
workspace_bytes = max_rehearsal_info.device_scratch_workspace_required
workspace_allocator = (
PreAllocatedWorkspaceAllocator(device_id=0, total_size=workspace_bytes)
if workspace_bytes > 0 else None
)
options_sized = PauliExpansionOptions(
allocator=workspace_allocator,
memory_limit=workspace_bytes or "80%",
blocking=True,
)
# Step 10: Create real expansions from buffers for ping-pong pattern.
# Use sort_order="little_endian_bitwise" because a Pauli expansion with a single term is always sorted.
in_expansion = expansion.from_empty(
in_xz, in_coef,
num_terms=1,
sort_order="little_endian_bitwise",
has_duplicates=False,
options=options_sized,
)
out_expansion = PauliExpansion(
handle,
NUM_CIRCUIT_QUBITS,
0,
out_xz,
out_coef,
sort_order=None,
has_duplicates=True,
options=options_sized,
)
The full example is available at kicked_ising_example_rehearsed.py.
Logging example#
The following example demonstrates how to enable and use logging with the Pauli propagation API.
Configure logging before importing pauliprop:
import logging
import cupy as cp
# Step 1: Configure logging BEFORE importing pauliprop.
# Use DEBUG for verbose output, INFO for milestone-only logging.
logging.basicConfig(
level=logging.DEBUG,
format='%(levelname)-8s %(message)s'
)
from cuquantum.pauliprop.experimental import (
LibraryHandle,
PauliExpansion,
PauliRotationGate,
get_num_packed_integers,
)
The main workflow with logging enabled:
handle = LibraryHandle()
# Step 3: Prepare data for a simple Pauli expansion.
num_qubits = 4
num_terms = 1
ints_per_string = get_num_packed_integers(num_qubits)
xz_bits = cp.zeros((num_terms, 2 * ints_per_string), dtype=cp.uint64)
coeffs = cp.ones(num_terms, dtype=cp.complex128)
# Step 4: Create a PauliExpansion (logs INFO on creation).
expansion = PauliExpansion(handle, num_qubits, num_terms, xz_bits, coeffs)
# Step 5: Apply a gate (logs INFO on start/completion, DEBUG for details).
gate = PauliRotationGate(angle=0.5, pauli_string="X", qubit_indices=[0])
result = expansion.apply_gate(gate, sort_order=None)
# Step 6: Compute trace (logs INFO on start/completion).
trace = result.trace_with_zero_state()
print(f"\nTrace result: {trace}")
API Reference#
Handles#
|
Pauli Expansion#
|
A Pauli operator expansion. |
|
A view on a contiguous range of Pauli terms inside a Pauli operator expansion. |
|
Returns the number of packed integers of cupaulipropPackedIntegerType_t needed to represent the X bits (or equivalently, the Z bits) of a single Pauli string. |
Gates and Channels#
|
A Clifford gate (I, X, Y, Z, H, S, CX, CY, CZ, SWAP, iSWAP, etc.). |
|
A Pauli rotation gate exp(-i * angle/2 * P) where P is a Pauli string. |
|
A Pauli noise channel acting on 1 or 2 qubits. |
|
An amplitude damping channel with damping and excitation probabilities. |
Configuration#
|
Options controlling Pauli expansion construction and execution. |
|
Truncation strategy for Pauli expansion operations. |
|
Information about the required resources for a rehearsed operation. |
Information about the required resources for a gate application rehearsal. |
|