Stabilizer APIs#

The stabilizer module, cuquantum.stabilizer, provides a Python-friendly interface for users to leverage the cuStabilizer library.

Frame simulation#

The frame simulation API enables efficient simulation of noisy stabilizer circuits by tracking Pauli frame errors across multiple shots simultaneously. This is particularly useful for error correction research, surface code simulation, and other applications involving noisy quantum circuits.

Overview#

The main classes in the stabilizer module are:

  • Options - Configuration parameters for the simulator

  • Circuit - Represents a quantum circuit

  • FrameSimulator - Performs frame simulation of noisy Clifford circuits

  • PauliTable - Container for Pauli frame data

  • PauliFrame - Individual Pauli frame representation

Options#

The Options class provides configuration parameters for the frame simulator. It follows a similar design as cuquantum.tensornet.NetworkOptions from cuTensorNet.

By default, the simulator manages the handle and allocator automatically. If you provide your own handle, you are responsible for destroying it after all related simulator objects are destroyed.

Usage:

from cuquantum.stabilizer import Options
import logging

# Use default options
options = Options()

# Or customize options
options = Options(
    device_id=0,
    handle=None,  # A handle will be created if not provided
    logger=logging.getLogger("my_logger"),
    allocator=None
)

Note

The logger parameter enables Python-level logging (memory allocations, simulation timing). For C library logging, use environment variables CUSTABILIZER_LOG_LEVEL and CUSTABILIZER_LOG_FILE. See Useful tips for details.

Circuit#

The Circuit class wraps a quantum circuit defined in Stim-compatible format. The circuit owns the device buffer where the circuit data is stored.

Usage:

from cuquantum.stabilizer import Circuit

# Create from string
circuit = Circuit("H 0\nCNOT 0 1\nM 0 1")

# Or from Stim circuit object
import stim
stim_circuit = stim.Circuit("H 0\nCNOT 0 1")
circuit = Circuit(stim_circuit)

# With options
options = Options(device_id=0)
circuit = Circuit("H 0\nCNOT 0 1", options=options)

FrameSimulator#

The FrameSimulator class simulates noise of Cliford quantum circuits using the propagation of Pauli frame. It tracks Pauli frame errors across multiple shots by maintaining X and Z bit tables for each qubit and shot, along with measurement outcomes. See more about frame simulation in Overview.

Basic Usage:

from cuquantum.stabilizer import Circuit, FrameSimulator

# Create simulator
num_qubits = 2
num_shots = 1024
num_measurements = 2

sim = FrameSimulator(
    num_qubits=num_qubits,
    num_paulis=num_shots,
    num_measurements=num_measurements
)

# Apply circuit
circuit = Circuit("H 0\nCNOT 0 1\nM 0 1")
sim.apply(circuit)

# Get results
pauli_table = sim.get_pauli_table(bit_packed=False)
measurements = sim.get_measurement_bits(bit_packed=False)

With Random Seed:

# Control randomization
sim = FrameSimulator(
    num_qubits=2,
    num_paulis=1024,
    num_measurements=2,
    seed=42,  # This seed will be used to create a seed for each apply() call
    randomize_measurements=True
)

sim.apply(circuit, seed=123)  # Override seed for this call

Memory Ownership Semantics#

The frame simulator allocates required memory when input tables are not provided. It also supports user-provided arrays for X, Z, and measurement tables.

In general, the inputs and outputs satisfy the following rules:

  1. The output package (numpy, cupy) of arrays is the same as last provided input package.

  2. The inputs must be located on gpu provided by Options.device_id (default id: 0)

  3. If no inputs are provided, the output package can be specified by package argument to FrameSimulator constructor. - If both package and inputs are provided, the package of inputs takes precedence.

Ownership Models#

Simulator-Owned Tables (Default)#
# Simulator allocates and owns all tables
sim = FrameSimulator(num_qubits=2, num_paulis=1024, num_measurements=2)

In this mode:

  • Memory is allocated on the GPU

  • Memory is automatically freed when the simulator is destroyed

  • The simulator initializes tables to zero

Note

Unlike Stim, cuStabilizer initializes the tables to zero even if randomize_measurements is True. You can use the RZ gate to apply randomization to qubits when needed.

User-Provided Tables - Bit-Packed and on GPU#

If the user provides bit-packed CuPy arrays, the simulator uses those arrays directly and modifies them in place.

import cupy as cp

num_qubits = 2
num_shots = 1024
stride = ((num_shots + 31) // 32) * 4  # Must be multiple of 4 bytes

x_table = cp.zeros((num_qubits, stride), dtype=cp.uint8)
z_table = cp.zeros((num_qubits, stride), dtype=cp.uint8)

sim = FrameSimulator(
    num_qubits=num_qubits,
    num_paulis=num_shots,
    num_measurements=2,
    x_table=x_table,
    z_table=z_table,
    bit_packed=True
)

In this mode:

  • The user must ensure the arrays are valid when calling apply() or get_pauli_table().

  • Changes to the simulator state are reflected in the user-provided arrays

  • The arrays may be used for subsequent simulation of other circuits that have an appropriate number of qubits and measurements.

User-Provided Tables - Unpacked or on CPU#

If the user provides unpacked arrays (either NumPy or CuPy), the simulator converts them to bit-packed format and owns the converted tables:

import numpy as np

num_qubits = 2
num_shots = 1024

# Unpacked format: one bit per element
x_table = np.zeros((num_qubits, num_shots), dtype=np.uint8)
z_table = np.zeros((num_qubits, num_shots), dtype=np.uint8)

sim = FrameSimulator(
    num_qubits=num_qubits,
    num_paulis=num_shots,
    num_measurements=2,
    x_table=x_table,
    z_table=z_table,
    bit_packed=False  # Simulator will convert to bit-packed
)

In this mode:

  • There is a conversion overhead from unpacked to bit-packed format.

  • Original arrays can be safely modified or deleted

Setting Tables After Construction#

You can also set or update tables after construction using FrameSimulator.set_input_tables():

# Create with default memory
sim = FrameSimulator(num_qubits=2, num_paulis=1024, num_measurements=2)

# Later, attach new tables
import cupy as cp
stride = ((1024 + 31) // 32) * 4
x_table = cp.zeros((2, stride), dtype=cp.uint8)
z_table = cp.zeros((2, stride), dtype=cp.uint8)

sim.set_input_tables(x=x_table, z=z_table, bit_packed=True)

View vs Copy Semantics#

The return behavior of FrameSimulator.get_pauli_xz_bits() and FrameSimulator.get_measurement_bits() depends on the format and package:

  • package: cupy, bit-packed: True - Returns views into simulator state of size S

  • package: cupy, bit-packed: False - Returns copies (on-device operation of size S*8)

  • package: numpy, bit-packed: True - Returns copies (device->host transfer of size S)

  • package: numpy, bit-packed: False - Returns copies (device->host transfer of size S*8)

import cupy as cp

# Create with CuPy bit-packed tables
stride = ((1024 + 31) // 32) * 4
x_table = cp.zeros((2, stride), dtype=cp.uint8)
z_table = cp.zeros((2, stride), dtype=cp.uint8)

sim = FrameSimulator(2, 1024, 2, x_table=x_table, z_table=z_table, bit_packed=True)

# Returns the same memory (view)
x_out, z_out = sim.get_pauli_xz_bits(bit_packed=True)
assert x_out.data.ptr == x_table.data.ptr  # Same memory!

# Unpacked returns a copy of the data
x_unpacked, z_unpacked = sim.get_pauli_xz_bits(bit_packed=False)

Bit-Packed Format#

The bit-packed format stores bits efficiently using 32-bit words with a stride that must be a multiple of 4 bytes (32 bits):

You can convert between the formats using numpy.packbits() and numpy.unpackbits().

import numpy as np

num_shots = 1024
stride = ((num_shots + 31) // 32) * 4  # = 128 bytes for 1024 shots
num_measurements = 5
unpacked_m = np.zeros((num_measurements, num_shots), dtype=np.uint8)
packed_m = np.packbits(unpacked_m, axis=1)

sim = FrameSimulator(num_qubits=16, num_paulis=num_shots,
                     num_measurements=num_measurements, measurement_table=packed_m,
                     bit_packed=True)

Working with Results#

PauliTable API#
# Get full Pauli table
pauli_table = sim.get_pauli_table(bit_packed=False)

# Access individual Pauli frames
first_frame = pauli_table[0]  # PauliFrame for first shot
print(first_frame)  # Prints string like "XYZI..."

# Iterate over all frames
for frame in pauli_table:
    print(frame)
Raw Arrays#
# Get X and Z bits separately
x_bits, z_bits = sim.get_pauli_xz_bits(bit_packed=False)
# Shape: (num_qubits, num_paulis)

# Get measurement outcomes
measurements = sim.get_measurement_bits(bit_packed=False)
# Shape: (num_measurements, num_paulis)

# Calculate properties efficiently on GPU
import cupy as cp
pauli_weight = cp.sum(x_bits | z_bits, axis=0)  # Weight per frame

Detector instructions#

This functionality is experimental and may change in the future.

cuStabilizer supports DETECTOR gates and places the detector outcomes in the measurement table. At this moment, this functionality is only supported by specifying the measurement table through FrameSimulator.set_input_tables().

import cupy as cp
from cuquantum.stabilizer import Circuit, FrameSimulator

circuit = Circuit("""
    DEPOLARIZE1(0.1) 0 1
    CNOT 0 1
    M 0 1
    DETECTOR rec[-1] rec[-2]
    """)

m_table = cp.zeros((4, 1024//8), dtype='uint8')
num_detectors = 1
num_measurements = 2
sim = FrameSimulator(num_qubits=2, num_paulis=1024,
                    num_measurements=num_measurements,
                    num_detectors=num_detectors)
sim.set_input_tables(m=m_table, bit_packed=True)
sim.apply(circuit)
print(m_table.shape)
measurements = m_table[:2]
detector_outcomes = m_table[2:]
for m in range(num_measurements):
    print(f"M {m}", measurements[m].tolist())
for d in range(num_detectors):
    print(f"DET {d}", detector_outcomes[d].tolist())

assert cp.all(detector_outcomes[0] == cp.bitwise_xor(*measurements))

Shot mask instructions#

Shot mask instructions update the shot mask that controls which shots are affected by the subsequent circuit instructions. For example, any instruction that comes after SHOT_MASK_SET(0, 5) will only affect shots 0-4.

As described in Overview, cuStabilizer frame simulation applies circuit instructions to two bit tables of size num_qubits * num_shots bits each. A circuit instruction like X_ERROR(1) 0 affects num_shots bits in a row that corresponds to qubit 0 in the X bit table. The masking feature allows to control specifically which shots participate in the instruction. Note that not all enabled shots will be changed by an instruction: for example, only about half of the enabled shots will be changed by an X_ERROR(0.5) gate.

Each shot consists of 2*num_qubits bits that represent the Pauli frame. A shot mask adds a bit to each frame that indicates whether the frame is “enabled” or “disabled”. The bitstring of length num_shots consisting of bits for each shot is called the shot mask. The shot mask gates allow to modify the mask by setting or XORing bits in the mask.

Two masking instructions are supported:

  • SHOT_MASK_SET(start, end) / MASK_SET(start, end): set the mask to enable shots in the half-open interval [start, end) and disable all other shots.

  • SHOT_MASK_XOR(start, end) / MASK_XOR(start, end): toggle the mask for shots in the half-open interval [start, end).

The SHOT_MASK_* and MASK_* names are equivalent. The SHOT_MASK_* spelling is intended to make it explicit that the mask range is expressed in shot indices.

Example

This example applies an X error to the first half of shots and a Z error to the second half:

import numpy as np
from cuquantum.stabilizer import Circuit, FrameSimulator

num_shots = 64
half = num_shots // 2

circuit = Circuit(
    f"SHOT_MASK_SET(0, {half})\n"
    "X_ERROR(1) 0\n"
    f"SHOT_MASK_SET({half}, {num_shots})\n"
    "Z_ERROR(1) 0\n"
)

sim = FrameSimulator(num_qubits=1, num_paulis=num_shots, num_measurements=0,
                     randomize_measurements=False, package="numpy")
sim.apply(circuit)

x, z = sim.get_pauli_xz_bits(bit_packed=False)
assert (x[0, :half] == 1).all() and (z[0, :half] == 0).all()
assert (x[0, half:] == 0).all() and (z[0, half:] == 1).all()

When applying a measurement instruction after modifying the shot mask, note that measurement counts are not shot-local. This will affect the location of measurement results in the measurement table.

For example, consider a circuit similar to the one above, but with a measurement instruction:

import numpy as np
from cuquantum.stabilizer import Circuit, FrameSimulator

num_shots = 16
half = num_shots // 2

circuit = Circuit(
    "X_ERROR(1) 0 1 2 3\n"
    f"SHOT_MASK_SET(0, {half})\n"
    "M 0 1\n"
    f"SHOT_MASK_SET({half}, {num_shots})\n"
    "M 2 3\n"
)

sim = FrameSimulator(num_qubits=4, num_paulis=num_shots, num_measurements=4,
                     randomize_measurements=False, package="numpy")
sim.apply(circuit)

measurements = sim.get_measurement_bits(bit_packed=False)

Note that the number of measurements should be equal to the total number of measurement instructions in the circuit, no matter how shot mask was changed.

The measurement table will be block-diagonal:

>>> print(measurements)
[[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]]

If you want to apply several sub-circuits to disjoint shot ranges and then measure all afterwards, enable all shots and measure all shots in a single measurement instruction.

Complete Example#

Here’s a complete example demonstrating the frame simulator with surface codes:

import stim
from cuquantum.stabilizer import Circuit, FrameSimulator, Options

# Generate a surface code circuit
distance = 5
rounds = 5
circuit = stim.Circuit.generated(
    "surface_code:rotated_memory_z",
    distance=distance,
    rounds=rounds,
    after_clifford_depolarization=0.001
)

# Create simulator
num_shots = 10000
sim = FrameSimulator(
    num_qubits=circuit.num_qubits,
    num_paulis=num_shots,
    num_measurements=circuit.num_measurements,
    num_detectors=circuit.num_detectors,
    randomize_measurements=False,
    seed=42,
    package="cupy"  # Use GPU arrays for output
)

# Convert and apply circuit
cuda_circuit = Circuit(circuit)
sim.apply(cuda_circuit)

# Analyze results
pauli_table = sim.get_pauli_table(bit_packed=False)
measurements = sim.get_measurement_bits(bit_packed=False)

# Process results on GPU
import cupy as cp
x_bits, z_bits = sim.get_pauli_xz_bits(bit_packed=False)
total_errors = cp.sum(x_bits | z_bits)
print(f"Total Pauli errors across all frames: {total_errors}")

# Examine individual frames
print("First 5 Pauli frames:")
for i in range(5):
    print(f"Frame {i}: {pauli_table[i]}")

Performance Tips#

  1. Use CuPy arrays for best performance - keeps data on GPU

  2. Provide bit-packed tables when possible - avoids conversion overhead

  3. Reuse simulators across multiple circuits of the same size

  4. Batch multiple shots - GPU efficiency increases with larger num_shots

  5. Use bit_packed=True when you don’t need individual element access

# Good: Large batch, stays on GPU
sim = FrameSimulator(100, 100000, 50, package="cupy")

# Less efficient: Small batch, transfers to CPU
sim = FrameSimulator(100, 10, 50, package="numpy")

API Reference#

Main Classes#

Circuit(circuit[, stream, options])

Represents a quantum circuit for the frame simulator.

FrameSimulator(num_qubits, num_paulis[, ...])

Simulates quantum circuits using the stabilizer frame formalism.

PauliTable(x_table, z_table, num_paulis, ...)

Holds Pauli frame table data.

PauliFrame(x_bits, z_bits[, num_qubits, ...])

A weight-less Pauli string.

Options([device_id, handle, logger, allocator])

A data class for providing options to the Frame Simulator.