cuStabilizer functions#
Library Initialization and Management API#
-
int custabilizerGetVersion()#
Returns the semantic version number of the cuStabilizer library.
- Returns:
Combined version number in format 10000 * major + 100 * minor + patch.
-
const char *custabilizerGetErrorString(custabilizerStatus_t status)#
Get the description string for a given cuStabilizer status code.
- Parameters:
status – [in] The status code.
- Returns:
A null-terminated string describing the status code.
-
custabilizerStatus_t custabilizerCreate(custabilizerHandle_t *handle)#
Create and initialize the library context.
- Parameters:
handle – [out] Library handle.
- Returns:
custabilizerStatus_t
-
custabilizerStatus_t custabilizerDestroy(custabilizerHandle_t handle)#
Destroy the library context.
- Parameters:
handle – [in] Library handle.
- Returns:
custabilizerStatus_t
Circuit API#
- group Circuit
Typedefs
-
typedef void *custabilizerCircuit_t#
Opaque data structure holding the Circuit.
Functions
- custabilizerStatus_t custabilizerCircuitSizeFromString(
- const custabilizerHandle_t handle,
- const char *circuitString,
- int64_t *bufferSize
Returns the size of the device buffer required for a circuit.
- Parameters:
handle – [in] Library handle.
circuitString – [in] String representation of the circuit.
bufferSize – [out] Size of the buffer in bytes.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerCreateCircuitFromString(
- const custabilizerHandle_t handle,
- const char *circuitString,
- void *bufferDevice,
- int64_t bufferSize,
- custabilizerCircuit_t *circuit
Create a new circuit from a string representation.
The string format is compatible with Stim circuit string.
Example:
custabilizerHandle_t handle; custabilizerCreate(&handle); char circuitString[] = "H 0\n" "X_ERROR(0.5) 1\n" "CNOT 0 1\n"; int64_t bufferSize; custabilizerCircuit_t circuit; custabilizerCircuitSizeFromString(handle, circuitString, &bufferSize); void *buffer; cudaMalloc(&buffer, bufferSize); custabilizerCreateCircuitFromString(handle, circuitString, buffer, bufferSize, &circuit);
Use custabilizerFrameSimulatorApplyCircuit to run the circuit.
- Parameters:
handle – [in] Library handle.
circuitString – [in] String representation of the circuit.
bufferDevice – [inout] Device buffer to store the circuit.
bufferSize – [in] Size of the device buffer in bytes.
circuit – [out] Pointer to the created circuit.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerDestroyCircuit(
- custabilizerCircuit_t circuit
Destroy a circuit.
- Parameters:
circuit – [in] Circuit to destroy.
- Returns:
custabilizerStatus_t
-
typedef void *custabilizerCircuit_t#
Simulator API#
- group FrameSimulator
Typedefs
-
typedef void *custabilizerFrameSimulator_t#
Opaque data structure holding the simulator state.
Functions
- custabilizerStatus_t custabilizerCreateFrameSimulator(
- const custabilizerHandle_t handle,
- int64_t numQubits,
- int64_t numShots,
- int64_t numMeasurements,
- int64_t tableStrideMajor,
- custabilizerFrameSimulator_t *frameSimulator
Create a FrameSimulator.
The stride is specified by the
tableStrideMajorparameter, which is usually(numShots + 7)/8padded to the next multiple of 4.The data is updated by calling custabilizerFrameSimulatorApplyCircuit.
- Parameters:
handle – [in] Library handle.
numQubits – [in] Number of qubits in the Pauli frame.
numShots – [in] Number of samples to simulate.
numMeasurements – [in] Number of measurements in the measurement table
tableStrideMajor – [in] Stride over the major axis for all input bit tables. Specified in bytes and must be a multiple of 4.
frameSimulator – [out] Pointer to the created frame simulator.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerDestroyFrameSimulator(
- custabilizerFrameSimulator_t frameSimulator
Destroy the FrameSimulator.
- Parameters:
frameSimulator – [in] Frame simulator to destroy.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerFrameSimulatorApplyCircuit(
- const custabilizerHandle_t handle,
- custabilizerFrameSimulator_t frameSimulator,
- const custabilizerCircuit_t circuit,
- int randomizeFrameAfterMeasurement,
- uint64_t seed,
- custabilizerBitInt_t *xTableDevice,
- custabilizerBitInt_t *zTableDevice,
- custabilizerBitInt_t *mTableDevice,
- cudaStream_t stream
Run Pauli frame simulation using the circuit.
Use custabilizerCreateFrameSimulator to create a frame simulator with appropriate parameters for this call. The method accepts an initial state in the form of bit tables. All bit tables assume LSB ordering. That is, the bit for the first shot is stored at mask 0x1. If the buffers are smaller than required minimum size, the behavior is undefined.
The
xTableDeviceandzTableDevicespecify the initial Pauli frame in a qubit-major format. The operator on Pauli stringIand qubitJis encoded by bitsIon rowJin x_table and z_table.(x_table[J][I], z_table[J][I]) Pauli operator 0, 0 I 0, 1 Z 1, 0 X 1, 1 Y
Here is an illustrative example of bit tables with 4 paulis on 3 qubits:
XYZ,IIZ,XII,IIY.int64_t numQubits = 3; int64_t numShots = 32; int64_t numMeasurements = 2; int64_t stride = (numShots + 7) / 8; int bit_table_bytes = numQubits * stride; int m_table_bytes = numMeasurements * stride; int bit_int_bytes = sizeof(custabilizerBitInt_t); custabilizerBitInt_t x_table[bit_table_bytes / bit_int_bytes] = { // IXIX // IIIY // YIZZ // pauli 4321 0x00000101, // Qubit 0 0x00000001, // Qubit 1 0x00001000 // Qubit 2 }; custabilizerBitInt_t z_table[bit_table_bytes / bit_int_bytes] = { // pauli 4321 0x00000000, // Qubit 0 0x00000001, // Qubit 1 0x00001011 // Qubit 2 }; custabilizerBitInt_t m_table[m_table_bytes / bit_int_bytes] = { 0x00000000, 0x00000000 }; custabilizerBitInt_t *xTableDevice, *zTableDevice, *mTableDevice; cudaMalloc(&xTableDevice, bit_table_bytes); cudaMalloc(&zTableDevice, bit_table_bytes); cudaMalloc(&mTableDevice, m_table_bytes); cudaMemcpy(xTableDevice, x_table, bit_table_bytes, cudaMemcpyHostToDevice); cudaMemcpy(zTableDevice, z_table, bit_table_bytes, cudaMemcpyHostToDevice); cudaMemcpy(mTableDevice, m_table, m_table_bytes, cudaMemcpyHostToDevice); custabilizerHandle_t handle; custabilizerCreate(&handle); custabilizerFrameSimulator_t frameSimulator; custabilizerStatus_t status = custabilizerCreateFrameSimulator(handle, numQubits, numShots, numMeasurements, stride, &frameSimulator); int seed = 5; cudaStream_t stream = 0; int rnd_frame = 0; // Assuming `circuit` is defined earlier status = custabilizerFrameSimulatorApplyCircuit( handle, frameSimulator, circuit, rnd_frame, seed, xTableDevice, zTableDevice, mTableDevice, stream); custabilizerDestroyFrameSimulator(frameSimulator); cudaFree(xTableDevice); cudaFree(zTableDevice); cudaFree(mTableDevice); custabilizerDestroy(handle);
- Parameters:
handle – [in] Library handle.
frameSimulator – [in] An instance of FrameSimulator with parameters consistent with the bit tables
circuit – [in] A circuit that acts on at most
numQubitsand contains at mostnumMeasurementsmeasurementsrandomizeFrameAfterMeasurement – [in] Disabling the randomization is helpful in some cases to focus on the error frame propagation.
seed – [in] Random seed.
xTableDevice – [inout] Device buffer of the X bit table in qubit-major order. Must be of size at least
numQubits*tableStrideMajorzTableDevice – [inout] Device buffer of the Z bit table in qubit-major order. Must be of size at least
numQubits*tableStrideMajormTableDevice – [inout] Device buffer of the measurement bit table in measurement-major order. Must be of size at least
numMeasurements*tableStrideMajorstream – [in] CUDA stream.
- Returns:
custabilizerStatus_t
-
typedef void *custabilizerFrameSimulator_t#
Rng API#
- group Rng
Functions
- custabilizerStatus_t custabilizerSampleProbArray(
- custabilizerHandle_t handle,
- int64_t numSamples,
- int64_t numProbs,
- const double *probs,
- uint64_t seed,
- custabilizerBitInt_t *samples,
- cudaStream_t stream
Sample Bernoulli random bits from a probability array.
Generates a dense bit-packed matrix of shape (numProbs, numSamples) where row
icontains independent Bernoulli(probs[i]) samples.The output is bit-packed using custabilizerBitInt_t. Row
ioccupies words[i * numSamples/32, (i+1) * numSamples/32). Within each word, bitbcorresponds to sample (word_index * 32 + b).- Parameters:
handle – [in] Library handle.
numSamples – [in] Number of samples (minor dimension). Must be a multiple of 32.
numProbs – [in] Number of probabilities (major dimension).
probs – [in] Probability array of length
numProbs(device-accessible pointer). Values should be in [0, 1]; out-of-range values are clamped, NaN is treated as 0.seed – [in] Random seed.
samples – [out] Output buffer for bit-packed samples (device-accessible pointer). Must have at least (numProbs * numSamples / 32) words.
stream – [in] CUDA stream.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerSampleProbArraySparsePrepare(
- custabilizerHandle_t handle,
- int64_t numSamples,
- int64_t numProbs,
- size_t *workspaceSize
Query the device workspace size required for sparse Bernoulli sampling.
The returned size depends on
numProbsandnumSamples. A single workspace allocation can be reused across calls with different seeds, probability values, or output buffers, as long as numProbs and numSamples do not exceed those used in the Prepare call.- Parameters:
handle – [in] Library handle.
numSamples – [in] Number of samples.
numProbs – [in] Number of probabilities.
workspaceSize – [out] Required device workspace size in bytes.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerSampleProbArraySparseCompute(
- custabilizerHandle_t handle,
- int64_t numSamples,
- int64_t numProbs,
- const double *probs,
- uint64_t seed,
- uint64_t *nnz,
- uint64_t *columnIndices,
- uint64_t *rowOffsets,
- void *workspace,
- size_t workspaceSize,
- cudaStream_t stream
Sample Bernoulli random bits from a probability array and return a CSR matrix.
Generates a CSR matrix of shape (numSamples, numProbs), where each row
scontains column indicesesuch that an independent Bernoulli(probs[e]) trial succeeded.The CSR data is returned via
rowOffsets(lengthnumSamples+1) andcolumnIndices(length*nnzon input).Capacity handling:
On input,
*nnzis the capacity ofcolumnIndices(number of entries).On successful return,
*nnzis the number of non-zeros written.If capacity is insufficient, returns CUSTABILIZER_STATUS_INSUFFICIENT_SPARSE_STORAGE and sets
*nnzto the required number of non-zeros.
Column indices within each row are NOT sorted; they follow probability-sort order from the tiled sampling kernel.
- Parameters:
handle – [in] Library handle.
numSamples – [in] Number of samples (rows / shots).
numProbs – [in] Number of probabilities (columns).
probs – [in] Probability array of length
numProbs(device-accessible pointer). Values should be in [0, 1]; out-of-range values are clamped, NaN is treated as 0.seed – [in] Random seed.
nnz – [inout] On input, capacity of
columnIndices. On output, number of non-zeros used (or required).columnIndices – [out] Output CSR column indices (device-accessible pointer), length at least
*nnzon input.rowOffsets – [out] Output CSR row offsets (device-accessible pointer), length
numSamples+1.workspace – [in] Device-accessible workspace of at least the size returned by Prepare.
workspaceSize – [in] Size of workspace in bytes.
stream – [in] CUDA stream.
- Returns:
custabilizerStatus_t
Linear Algebra API#
- group Linalg
Functions
- custabilizerStatus_t custabilizerGF2SparseDenseMatrixMultiply(
- custabilizerHandle_t handle,
- uint64_t m,
- uint64_t n,
- uint64_t k,
- uint64_t nnz,
- const uint64_t *columnIndices,
- const uint64_t *rowOffsets,
- const custabilizerBitInt_t *B,
- int32_t beta,
- custabilizerBitInt_t *C,
- cudaStream_t stream
Compute GF(2) sparse-dense matrix multiplication
C= A @ B.Ais sparse (CSR),Bis bit-packed dense.Shapes:
Ais CSR with shape (m, k)Bis bit-packed dense with shape (k, n)Cis bit-packed dense with shape (m, n)
Packing requirement:
nmust be a multiple of 32 so that rows are word-aligned in the bit-packed storage.
The multiplication is over GF(2): ( C = A \otimes B + \beta C )
beta == 0: ( C = A \otimes B ) (assign, C is not read)
beta == 1: ( C ^= A \otimes B ) (XOR-accumulate)
- Parameters:
handle – [in] Library handle.
m – [in] Number of rows of
AandC.n – [in] Number of columns of
BandC(must be a multiple of 32).k – [in] Number of columns of
Aand rows ofB.nnz – [in] Number of non-zeros in
Aand length ofcolumnIndices.columnIndices – [in] CSR column indices of
A(device-accessible pointer), lengthnnz.rowOffsets – [in] CSR row offsets of
A(device-accessible pointer), lengthm+1.B – [in] Bit-packed dense input matrix
B(device-accessible pointer).beta – [in] 0 for assign (C not read), 1 for XOR-accumulate.
C – [inout] Bit-packed dense output matrix
C(device-accessible pointer).stream – [in] CUDA stream.
- Returns:
custabilizerStatus_t
- custabilizerStatus_t custabilizerGF2SparseSparseMatrixMultiply(
- custabilizerHandle_t handle,
- uint64_t m,
- uint64_t n,
- uint64_t k,
- const uint64_t *aColumnIndices,
- const uint64_t *aRowOffsets,
- uint64_t bNNZ,
- const uint64_t *bColumnIndices,
- const uint64_t *bRowOffsets,
- int32_t beta,
- custabilizerBitInt_t *C,
- cudaStream_t stream
Compute GF(2) sparse-sparse matrix multiplication
C= A @ B.Both
AandBare in CSR format with element-level sparsity.Bstores individual column indices (bit positions), not bit-packed words.Shapes:
Ais CSR with shape (m, k)Bis CSR with shape (k, n), column indices are bit positions in [0, n)Cis bit-packed dense with shape (m, n)
Packing requirement:
nmust be a multiple of 32 so that rows are word-aligned in the bit-packed storage.beta == 0: ( C = A \otimes B ) (assign, C is not read)
beta == 1: ( C ^= A \otimes B ) (XOR-accumulate)
- Parameters:
handle – [in] Library handle.
m – [in] Number of rows of
AandC.n – [in] Number of columns of
BandC(must be a multiple of 32).k – [in] Number of columns of
Aand rows ofB.aColumnIndices – [in] CSR column indices of
A(device-accessible pointer).aRowOffsets – [in] CSR row offsets of
A(device-accessible pointer), lengthm+1.bNNZ – [in] Number of non-zeros in
B.bColumnIndices – [in] CSR column indices of
B(device-accessible pointer), lengthbNNZ. Column indices within each row must be sorted in ascending order.bRowOffsets – [in] CSR row offsets of
B(device-accessible pointer), lengthk+1.beta – [in] 0 for assign (C not read), 1 for XOR-accumulate.
C – [inout] Bit-packed dense output matrix
C(device-accessible pointer).stream – [in] CUDA stream.
- Returns:
custabilizerStatus_t