Stabilizer APIs#
The stabilizer module, cuquantum., provides a Python-friendly
interface for users to leverage the cuStabilizer library.
Frame simulation#
The frame simulation API enables efficient simulation of noisy stabilizer circuits by tracking Pauli frame errors across multiple shots simultaneously. This is particularly useful for error correction research, surface code simulation, and other applications involving noisy quantum circuits.
Overview#
The main classes in the stabilizer module are:
Options- Configuration parameters for the simulatorCircuit- Represents a quantum circuitFrameSimulator- Performs frame simulation of noisy Clifford circuitsPauliTable- Container for Pauli frame dataPauliFrame- Individual Pauli frame representation
Options#
The Options class provides configuration parameters for the frame simulator.
It follows a similar design as cuquantum. from cuTensorNet.
By default, the simulator manages the handle and allocator automatically. If you provide your own handle, you are responsible for destroying it after all related simulator objects are destroyed.
Usage:
from cuquantum.stabilizer import Options
import logging
# Use default options
options = Options()
# Or customize options
options = Options(
device_id=0,
handle=None, # A handle will be created if not provided
logger=logging.getLogger("my_logger"),
allocator=None
)
Note
The logger parameter enables Python-level logging (memory allocations, simulation timing).
For C library logging, use environment variables CUSTABILIZER_LOG_LEVEL and CUSTABILIZER_LOG_FILE.
See Useful tips for details.
Circuit#
The Circuit class wraps a quantum circuit defined in Stim-compatible format.
The circuit owns the device buffer where the circuit data is stored.
Usage:
from cuquantum.stabilizer import Circuit
# Create from string
circuit = Circuit("H 0\nCNOT 0 1\nM 0 1")
# Or from Stim circuit object
import stim
stim_circuit = stim.Circuit("H 0\nCNOT 0 1")
circuit = Circuit(stim_circuit)
# With options
options = Options(device_id=0)
circuit = Circuit("H 0\nCNOT 0 1", options=options)
FrameSimulator#
The FrameSimulator class simulates noise of Clifford quantum circuits
using the propagation of Pauli frames. It tracks Pauli frame errors across
multiple shots by maintaining X and Z bit tables for each qubit and shot, along
with measurement outcomes. See more about frame simulation in Overview.
Basic Usage:
from cuquantum.stabilizer import Circuit, FrameSimulator
# Create simulator
num_qubits = 2
num_shots = 1024
num_measurements = 2
sim = FrameSimulator(
num_qubits=num_qubits,
num_paulis=num_shots,
num_measurements=num_measurements
)
# Apply circuit
circuit = Circuit("H 0\nCNOT 0 1\nM 0 1")
sim.apply(circuit)
# Get results
pauli_table = sim.get_pauli_table(bit_packed=False)
measurements = sim.get_measurement_bits(bit_packed=False)
With Random Seed:
# Control randomization
sim = FrameSimulator(
num_qubits=2,
num_paulis=1024,
num_measurements=2,
seed=42, # This seed will be used to create a seed for each apply() call
randomize_measurements=True
)
sim.apply(circuit, seed=123) # Override seed for this call
Memory Ownership Semantics#
The frame simulator allocates required memory when input tables are not provided. It also supports user-provided arrays for X, Z, and measurement tables.
In general, the inputs and outputs satisfy the following rules:
The output package (
numpy,cupy) of arrays is the same as last provided input package.The inputs must be located on gpu provided by
Options.device_id(default id: 0)If no inputs are provided, the output package can be specified by
packageargument toFrameSimulatorconstructor. - If bothpackageand inputs are provided, the package of inputs takes precedence.
Ownership Models#
Simulator-Owned Tables (Default)#
# Simulator allocates and owns all tables
sim = FrameSimulator(num_qubits=2, num_paulis=1024, num_measurements=2)
In this mode:
Memory is allocated on the GPU
Memory is automatically freed when the simulator is destroyed
The simulator initializes tables to zero
Note
When randomize_measurements=True and no input tables are provided, the Z
frame is initialized with Bernoulli(0.5) random samples. The X and
measurement tables are initialized to zero.
When input tables are provided, no initial randomization is performed.
Subsequent apply() calls will not randomize the frame again.
User-Provided Tables - Bit-Packed and on GPU#
If the user provides bit-packed CuPy arrays, the simulator uses those arrays directly and modifies them in place.
import cupy as cp
num_qubits = 2
num_shots = 1024
stride = ((num_shots + 31) // 32) * 4 # Must be multiple of 4 bytes
x_table = cp.zeros((num_qubits, stride), dtype=cp.uint8)
z_table = cp.zeros((num_qubits, stride), dtype=cp.uint8)
sim = FrameSimulator(
num_qubits=num_qubits,
num_paulis=num_shots,
num_measurements=2,
x_table=x_table,
z_table=z_table,
bit_packed=True
)
In this mode:
The user must ensure the arrays are valid when calling apply() or get_pauli_table().
Changes to the simulator state are reflected in the user-provided arrays
The arrays may be used for subsequent simulation of other circuits that have an appropriate number of qubits and measurements.
User-Provided Tables - Unpacked or on CPU#
If the user provides unpacked arrays (either NumPy or CuPy), the simulator converts them to bit-packed format and owns the converted tables:
import numpy as np
num_qubits = 2
num_shots = 1024
# Unpacked format: one bit per element
x_table = np.zeros((num_qubits, num_shots), dtype=np.uint8)
z_table = np.zeros((num_qubits, num_shots), dtype=np.uint8)
sim = FrameSimulator(
num_qubits=num_qubits,
num_paulis=num_shots,
num_measurements=2,
x_table=x_table,
z_table=z_table,
bit_packed=False # Simulator will convert to bit-packed
)
In this mode:
There is a conversion overhead from unpacked to bit-packed format.
Original arrays can be safely modified or deleted
Setting Tables After Construction#
You can also set or update tables after construction using FrameSimulator.set_input_tables():
# Create with default memory
sim = FrameSimulator(num_qubits=2, num_paulis=1024, num_measurements=2)
# Later, attach new tables
import cupy as cp
stride = ((1024 + 31) // 32) * 4
x_table = cp.zeros((2, stride), dtype=cp.uint8)
z_table = cp.zeros((2, stride), dtype=cp.uint8)
sim.set_input_tables(x=x_table, z=z_table, bit_packed=True)
View vs Copy Semantics#
The return behavior of FrameSimulator.get_pauli_xz_bits() and
FrameSimulator.get_measurement_bits() depends on the format and package:
package:
cupy, bit-packed:True- Returns views into simulator state of sizeSpackage:
cupy, bit-packed:False- Returns copies (on-device operation of sizeS*8)package:
numpy, bit-packed:True- Returns copies (device->host transfer of sizeS)package:
numpy, bit-packed:False- Returns copies (device->host transfer of sizeS*8)
import cupy as cp
# Create with CuPy bit-packed tables
stride = ((1024 + 31) // 32) * 4
x_table = cp.zeros((2, stride), dtype=cp.uint8)
z_table = cp.zeros((2, stride), dtype=cp.uint8)
sim = FrameSimulator(2, 1024, 2, x_table=x_table, z_table=z_table, bit_packed=True)
# Returns the same memory (view)
x_out, z_out = sim.get_pauli_xz_bits(bit_packed=True)
assert x_out.data.ptr == x_table.data.ptr # Same memory!
# Unpacked returns a copy of the data
x_unpacked, z_unpacked = sim.get_pauli_xz_bits(bit_packed=False)
Bit-Packed Format#
The bit-packed format stores bits efficiently using 32-bit words with a stride that must be a multiple of 4 bytes (32 bits):
You can convert between the formats using numpy.packbits() and numpy.unpackbits().
import numpy as np
num_shots = 1024
stride = ((num_shots + 31) // 32) * 4 # = 128 bytes for 1024 shots
num_measurements = 5
unpacked_m = np.zeros((num_measurements, num_shots), dtype=np.uint8)
packed_m = np.packbits(unpacked_m, axis=1)
sim = FrameSimulator(num_qubits=16, num_paulis=num_shots,
num_measurements=num_measurements, measurement_table=packed_m,
bit_packed=True)
Working with Results#
PauliTable API#
# Get full Pauli table
pauli_table = sim.get_pauli_table(bit_packed=False)
# Access individual Pauli frames
first_frame = pauli_table[0] # PauliFrame for first shot
print(first_frame) # Prints string like "XYZI..."
# Iterate over all frames
for frame in pauli_table:
print(frame)
Raw Arrays#
# Get X and Z bits separately
x_bits, z_bits = sim.get_pauli_xz_bits(bit_packed=False)
# Shape: (num_qubits, num_paulis)
# Get measurement outcomes
measurements = sim.get_measurement_bits(bit_packed=False)
# Shape: (num_measurements, num_paulis)
# Calculate properties efficiently on GPU
import cupy as cp
pauli_weight = cp.sum(x_bits | z_bits, axis=0) # Weight per frame
Detector instructions#
This functionality is experimental and may change in the future.
cuStabilizer supports DETECTOR gates and
places the detector outcomes in the measurement table.
At this moment, this functionality is only supported by specifying
the measurement table through FrameSimulator.set_input_tables().
import cupy as cp
from cuquantum.stabilizer import Circuit, FrameSimulator
circuit = Circuit("""
DEPOLARIZE1(0.1) 0 1
CNOT 0 1
M 0 1
DETECTOR rec[-1] rec[-2]
""")
m_table = cp.zeros((4, 1024//8), dtype='uint8')
num_detectors = 1
num_measurements = 2
sim = FrameSimulator(num_qubits=2, num_paulis=1024,
num_measurements=num_measurements,
num_detectors=num_detectors)
sim.set_input_tables(m=m_table, bit_packed=True)
sim.apply(circuit)
print(m_table.shape)
measurements = m_table[:2]
detector_outcomes = m_table[2:]
for m in range(num_measurements):
print(f"M {m}", measurements[m].tolist())
for d in range(num_detectors):
print(f"DET {d}", detector_outcomes[d].tolist())
assert cp.all(detector_outcomes[0] == cp.bitwise_xor(*measurements))
Shot mask instructions#
Shot mask instructions update the shot mask that controls which shots are affected by the
subsequent circuit instructions. For example, any instruction that comes after
SHOT_MASK_SET(0, 5) will only affect shots 0-4.
As described in Overview, cuStabilizer frame simulation applies circuit
instructions to two bit tables of size num_qubits * num_shots bits each. A
circuit instruction like X_ERROR(1) 0 affects num_shots bits in a row
that corresponds to qubit 0 in the X bit table. The masking feature allows to
control specifically which shots participate in the instruction. Note
that not all enabled shots will be changed by an instruction: for example,
only about half of the enabled shots will be changed by an X_ERROR(0.5) gate.
Each shot consists of 2*num_qubits bits that represent the Pauli frame. A
shot mask adds a bit to each frame that indicates whether the frame is “enabled”
or “disabled”. The bitstring of length num_shots consisting of bits for each shot
is called the shot mask. The shot mask gates allow to modify the mask by
setting or XORing bits in the mask.
Two masking instructions are supported:
SHOT_MASK_SET(start, end)/MASK_SET(start, end): set the mask to enable shots in the half-open interval[start, end)and disable all other shots.SHOT_MASK_XOR(start, end)/MASK_XOR(start, end): toggle the mask for shots in the half-open interval[start, end).
The SHOT_MASK_* and MASK_* names are equivalent. The SHOT_MASK_* spelling is intended to
make it explicit that the mask range is expressed in shot indices.
Example
This example applies an X error to the first half of shots and a Z error to the second half:
import numpy as np
from cuquantum.stabilizer import Circuit, FrameSimulator
num_shots = 64
half = num_shots // 2
circuit = Circuit(
f"SHOT_MASK_SET(0, {half})\n"
"X_ERROR(1) 0\n"
f"SHOT_MASK_SET({half}, {num_shots})\n"
"Z_ERROR(1) 0\n"
)
sim = FrameSimulator(num_qubits=1, num_paulis=num_shots, num_measurements=0,
randomize_measurements=False, package="numpy")
sim.apply(circuit)
x, z = sim.get_pauli_xz_bits(bit_packed=False)
assert (x[0, :half] == 1).all() and (z[0, :half] == 0).all()
assert (x[0, half:] == 0).all() and (z[0, half:] == 1).all()
When applying a measurement instruction after modifying the shot mask, note that measurement counts are not shot-local. This will affect the location of measurement results in the measurement table.
For example, consider a circuit similar to the one above, but with a measurement instruction:
import numpy as np
from cuquantum.stabilizer import Circuit, FrameSimulator
num_shots = 16
half = num_shots // 2
circuit = Circuit(
"X_ERROR(1) 0 1 2 3\n"
f"SHOT_MASK_SET(0, {half})\n"
"M 0 1\n"
f"SHOT_MASK_SET({half}, {num_shots})\n"
"M 2 3\n"
)
sim = FrameSimulator(num_qubits=4, num_paulis=num_shots, num_measurements=4,
randomize_measurements=False, package="numpy")
sim.apply(circuit)
measurements = sim.get_measurement_bits(bit_packed=False)
Note that the number of measurements should be equal to the total number of measurement instructions in the circuit, no matter how shot mask was changed.
The measurement table will be block-diagonal:
>>> print(measurements)
[[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]]
If you want to apply several sub-circuits to disjoint shot ranges and then measure all afterwards, enable all shots and measure all shots in a single measurement instruction.
Complete Example#
Here’s a complete example demonstrating the frame simulator with surface codes:
import stim
from cuquantum.stabilizer import Circuit, FrameSimulator, Options
# Generate a surface code circuit
distance = 5
rounds = 5
circuit = stim.Circuit.generated(
"surface_code:rotated_memory_z",
distance=distance,
rounds=rounds,
after_clifford_depolarization=0.001
)
# Create simulator
num_shots = 10000
sim = FrameSimulator(
num_qubits=circuit.num_qubits,
num_paulis=num_shots,
num_measurements=circuit.num_measurements,
num_detectors=circuit.num_detectors,
randomize_measurements=False,
seed=42,
package="cupy" # Use GPU arrays for output
)
# Convert and apply circuit
cuda_circuit = Circuit(circuit)
sim.apply(cuda_circuit)
# Analyze results
pauli_table = sim.get_pauli_table(bit_packed=False)
measurements = sim.get_measurement_bits(bit_packed=False)
# Process results on GPU
import cupy as cp
x_bits, z_bits = sim.get_pauli_xz_bits(bit_packed=False)
total_errors = cp.sum(x_bits | z_bits)
print(f"Total Pauli errors across all frames: {total_errors}")
# Examine individual frames
print("First 5 Pauli frames:")
for i in range(5):
print(f"Frame {i}: {pauli_table[i]}")
Performance Tips#
Use CuPy arrays for best performance - keeps data on GPU
Provide bit-packed tables when possible - avoids conversion overhead
Reuse simulators across multiple circuits of the same size
Batch multiple shots - GPU efficiency increases with larger
num_shotsUse
bit_packed=Truewhen you don’t need individual element access
# Good: Large batch, stays on GPU
sim = FrameSimulator(100, 100000, 50, package="cupy")
# Less efficient: Small batch, transfers to CPU
sim = FrameSimulator(100, 10, 50, package="numpy")
DEM sampling#
import stim
from cuquantum.stabilizer import DEMSampler
dem = stim.Circuit.generated("surface_code:rotated_memory_z",
distance=5, rounds=5, after_clifford_depolarization=0.001,
).detector_error_model(
decompose_errors=True, approximate_disjoint_errors=True,
).flattened()
sampler = DEMSampler(dem, 100_000, package="cupy")
sampler.sample(100_000, seed=42)
outcomes = sampler.get_outcomes(bit_packed=True)
Generates detection events from a stim.DetectorErrorModel on the GPU, without
running a full circuit simulation. Equivalent to
stim.DetectorErrorModel.compile_sampler().sample(). The underlying primitives
also work with any binary error-to-outcome mapping matrix – see BitMatrixSparseSampler.
Overview#
DEMSampler– high-level sampler that accepts astim.DetectorErrorModel.BitMatrixSparseSampler– lower-level sampler that accepts any binary matrix and probability array. Use this for custom error-to-outcome mappings.BitMatrixCSR– CSR container for binary GF(2) matrices with conversions to/fromscipy.sparseandcupyx.scipy.sparse.BitMatrixSampler– alias forBitMatrixSparseSampler.
Note
Output arrays use (shots × features) layout: outcomes[s, d] is detector d
for shot s. This is transposed relative to
FrameSimulator, which uses (features × shots): measurements[m, s] is
measurement m for shot s.
DEMSampler#
DEMSampler parses a stim.DetectorErrorModel and delegates to
BitMatrixSparseSampler internally.
The sampler pre-allocates GPU memory for up to max_shots shots. Call
sample() repeatedly with different seeds or shot counts (up to
max_shots) without reallocating:
for seed in range(100):
sampler.sample(100_000, seed=seed)
outcomes = sampler.get_outcomes(bit_packed=True)
# ... feed outcomes to decoder ...
Note
DEMSampler defaults to package="cupy" (data stays on GPU).
BitMatrixSparseSampler defaults to package="numpy".
BitMatrixSparseSampler#
BitMatrixSparseSampler accepts any binary matrix and probability array. Use this
when the error-to-outcome mapping is not derived from a DEM, or when using a custom
mapping (e.g. error-to-measurement or error-to-observable).
With a dense numpy matrix:
import numpy as np
from cuquantum.stabilizer import BitMatrixSparseSampler
n_errors = 100
n_outcomes = 50
shots = 10_000
matrix = np.random.randint(0, 2, size=(n_errors, n_outcomes), dtype=np.uint8)
probs = np.random.uniform(0.001, 0.05, size=n_errors)
sampler = BitMatrixSparseSampler(matrix, probs, shots)
sampler.sample(shots, seed=0)
outcomes = sampler.get_outcomes(bit_packed=False)
With a scipy or cupyx sparse CSR matrix:
import numpy as np
import scipy.sparse
from cuquantum.stabilizer import BitMatrixSparseSampler
matrix_csr = scipy.sparse.random(100, 50, density=0.1, format="csr")
probs = np.random.uniform(0.001, 0.05, size=100)
sampler = BitMatrixSparseSampler(matrix_csr, probs, 10_000)
sampler.sample(10_000, seed=0)
Retrieving errors#
get_errors() returns the sampled error vector as a
BitMatrixCSR object which can be converted to scipy or cupyx sparse:
sampler.sample(1000, seed=0)
errors = sampler.get_errors() # BitMatrixCSR
errors_scipy = errors.to_scipy_sparse() # scipy.sparse.csr_array
Detection rates can be computed from unpacked outcomes:
import cupy as cp
outcomes = sampler.get_outcomes(bit_packed=False)
det_rates = cp.mean(outcomes.astype(cp.float32), axis=0)
print(f"Shape: {outcomes.shape}")
print(f"Mean detection rate: {float(det_rates.mean()):.6f}")
Memory Ownership Semantics#
Inputs (matrix, probabilities) are always copied to the GPU.
The return behavior of get_outcomes() and
get_errors() depends on the format and package:
package:
cupy, bit-packed:True- Returns views into sampler state of sizeSpackage:
cupy, bit-packed:False- Returns copies (on-device unpack of sizeS*8)package:
numpy, bit-packed:True- Returns copies (device->host transfer of sizeS)package:
numpy, bit-packed:False- Returns copies (device->host transfer of sizeS*8)
Note
Views are invalidated by the next sample()
call. Copy the result if you need to keep it across calls.
sampler.sample(1000)
outcomes = sampler.get_outcomes().copy() # safe to keep
sampler.sample(1000) # does not affect outcomes
Interoperability with PyTorch#
DEM sampling accepts numpy and cupy arrays. torch.Tensor is not directly
supported. To use results with PyTorch, use package="cupy" and convert on GPU:
import torch
sampler = DEMSampler(dem, max_shots, package="cupy")
sampler.sample(max_shots, seed=0)
outcomes_cp = sampler.get_outcomes(bit_packed=True)
outcomes_torch = torch.as_tensor(outcomes_cp, device="cuda") # zero-copy
Performance Tips#
Use
package="cupy"- keeps data on GPUUse
bit_packed=True- avoids 8× data expansion on retrievalUse large shot counts (100K+) - GPU efficiency increases with more shots
Reuse the sampler across multiple
sample()callsWarm up CUDA - first GPU call pays one-time context initialization cost
# Good: Large batch, stays on GPU
sampler = DEMSampler(dem, 1_000_000, package="cupy")
sampler.sample(1_000_000, seed=0)
outcomes = sampler.get_outcomes(bit_packed=True)
# Less efficient: Small batch, transfers to CPU
sampler = DEMSampler(dem, 100, package="numpy")
sampler.sample(100, seed=0)
outcomes = sampler.get_outcomes(bit_packed=False)
API Reference#
Main Classes#
|
Represents a quantum circuit for the frame simulator. |
|
Simulates quantum circuits using the stabilizer frame formalism. |
|
Holds Pauli frame table data. |
|
A weight-less Pauli string. |
|
High-level sampler that takes a |
|
Sparse Bernoulli sampler with GF(2) matrix multiply. |
|
CSR representation of a binary (GF(2)) matrix. |
|
A data class for providing options to cuStabilizer objects. |