cuStateVec Functions¶

Library Management¶

Handle Management API¶

`custatevecCreate`¶

custatevecStatus_t custatevecCreate(custatevecHandle_t *handle)¶

This function initializes the cuStateVec library and creates a handle on the cuStateVec context. It must be called prior to any other cuStateVec API functions.

Parameters: handle – [in] the pointer to the handle to the cuStateVec context

`custatevecDestroy`¶

custatevecStatus_t custatevecDestroy(custatevecHandle_t handle)¶

This function releases resources used by the cuStateVec library.

Parameters: handle – [in] the handle to the cuStateVec context

`custatevecGetDefaultWorkspaceSize`¶

custatevecStatus_t custatevecGetDefaultWorkspaceSize(custatevecHandle_t handle, size_t *workspaceSizeInBytes)¶

This function returns the default workspace size defined by the cuStateVec library.

This function returns the default size used for the workspace.

Parameters

handle – [in] the handle to the cuStateVec context
workspaceSizeInBytes – [out] default workspace size

`custatevecSetWorkspace`¶

custatevecStatus_t custatevecSetWorkspace(custatevecHandle_t handle, void *workspace, size_t workspaceSizeInBytes)¶

This function sets the workspace used by the cuStateVec library.

This function sets the workspace attached to the handle. The required size of the workspace is obtained by custatevecGetDefaultWorkspaceSize().

By setting a larger workspace, users are able to execute functions without allocating the extra workspace in some functions.

If a device memory handler is set, the workspace can be set to null and the workspace is allocated using the user-defined memory pool.

Parameters

handle – [in] the handle to the cuStateVec context
workspace – [in] device pointer to workspace
workspaceSizeInBytes – [in] workspace size

CUDA Stream Management API¶

`custatevecSetStream`¶

custatevecStatus_t custatevecSetStream(custatevecHandle_t handle, cudaStream_t streamId)¶

This function sets the stream to be used by the cuStateVec library to execute its routine.

Parameters

handle – [in] the handle to the cuStateVec context
streamId – [in] the stream to be used by the library

`custatevecGetStream`¶

custatevecStatus_t custatevecGetStream(custatevecHandle_t handle, cudaStream_t *streamId)¶

This function gets the cuStateVec library stream used to execute all calls from the cuStateVec library functions.

Parameters

handle – [in] the handle to the cuStateVec context
streamId – [out] the stream to be used by the library

Error Management API¶

`custatevecGetErrorName`¶

const char *custatevecGetErrorName(custatevecStatus_t status)¶

This function returns the name string for the input error code. If the error code is not recognized, “unrecognized error code” is returned.

Parameters: status – [in] Error code to convert to string

`custatevecGetErrorString`¶

const char *custatevecGetErrorString(custatevecStatus_t status)¶

This function returns the description string for an error code. If the error code is not recognized, “unrecognized error code” is returned.

Parameters: status – [in] Error code to convert to string

Logger API¶

`custatevecLoggerSetCallback`¶

custatevecStatus_t custatevecLoggerSetCallback(custatevecLoggerCallback_t callback)¶

Experimental: This function sets the logging callback function.

Parameters: callback – [in] Pointer to a callback function. See custatevecLoggerCallback_t.

`custatevecLoggerSetCallbackData`¶

custatevecStatus_t custatevecLoggerSetCallbackData(custatevecLoggerCallbackData_t callback, void *userData)¶

Experimental: This function sets the logging callback function with user data.

Parameters

callback – [in] Pointer to a callback function. See custatevecLoggerCallbackData_t.
userData – [in] Pointer to user-provided data.

`custatevecLoggerSetFile`¶

custatevecStatus_t custatevecLoggerSetFile(FILE *file)¶

Experimental: This function sets the logging output file.

Note

Once registered using this function call, the provided file handle must not be closed unless the function is called again to switch to a different file handle.

Parameters: file – [in] Pointer to an open file. File should have write permission.

`custatevecLoggerOpenFile`¶

custatevecStatus_t custatevecLoggerOpenFile(const char *logFile)¶

Experimental: This function opens a logging output file in the given path.

Parameters: logFile – [in] Path of the logging output file.

`custatevecLoggerSetLevel`¶

custatevecStatus_t custatevecLoggerSetLevel(int32_t level)¶

Experimental: This function sets the value of the logging level.

Levels are defined as follows:

Level	Summary	Long Description
“0”	Off	logging is disabled (default)
“1”	Errors	only errors will be logged
“2”	Performance Trace	API calls that launch CUDA kernels will log their parameters and important information
“3”	Performance Hints	hints that can potentially improve the application’s performance
“4”	Heuristics Trace	provides general information about the library execution, may contain details about heuristic status
“5”	API Trace	API Trace - API calls will log their parameter and important information

Parameters: level – [in] Value of the logging level.

`custatevecLoggerSetMask`¶

custatevecStatus_t custatevecLoggerSetMask(int32_t mask)¶

Experimental: This function sets the value of the logging mask. Masks are defined as a combination of the following masks:

Level	Description
“0”	Off
“1”	Errors
“2”	Performance Trace
“4”	Performance Hints
“8”	Heuristics Trace
“16”	API Trace

Refer to custatevecLoggerCallback_t for the details.

Parameters: mask – [in] Value of the logging mask.

`custatevecLoggerForceDisable`¶

custatevecStatus_t custatevecLoggerForceDisable()¶: Experimental: This function disables logging for the entire run.

Versioning API¶

`custatevecGetProperty`¶

custatevecStatus_t custatevecGetProperty(libraryPropertyType type, int32_t *value)¶

This function returns the version information of the cuStateVec library.

Parameters

type – [in] requested property (MAJOR_VERSION, MINOR_VERSION, or PATCH_LEVEL).
value – [out] value of the requested property.

`custatevecGetVersion`¶

size_t custatevecGetVersion()¶: This function returns the version information of the cuStateVec library.

Memory Management API¶

A stream-ordered memory allocator (or mempool for short) allocates/deallocates memory asynchronously from/to a mempool in a stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra- stream dependency. There are several well-implemented stream-ordered mempools available, such as cudaMemPool_t that is built-in at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool, see here) and the RAPIDS Memory Manager (RMM). For a detailed introduction, see the NVIDIA Developer Blog.

The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuStateVec can take care of most of the memory management for users. Below is an illustration of what can be done:

MyMemPool pool = MyMemPool();  // kept alive for the entire process in real apps

int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
  return reinterpret_cast<MyMemPool*>(ctx)->alloc(ptr, size, stream);
}

int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
  return reinterpret_cast<MyMemPool*>(ctx)->dealloc(ptr, size, stream);
}

// create a mem handler and fill in the required members for the library to use
custatevecDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&pool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;
memcpy(handler.name, std::string("my pool").c_str(), CUSTATEVEC_ALLOCATOR_NAME_LEN);

// bind the handler to the library handle
custatevecSetDeviceMemHandler(handle, &handler);

/* ... use gate application as usual ... */

// User doesn’t compute the required sizes

// User doesn’t query the workspace size (but one can if desired)

// User doesn’t allocate memory!

// User sets null pointer to indicate the library should draw memory from the user's pool;
void* extraWorkspace = nullptr;
size_t extraWorkspaceInBytes = 0;
custatevecApplyMatrix(
    handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout,
    adjoint, targets, nTargets, controls, nControls, controlBitValues,
    computeType, extraWorkspace, extraWorkspaceSizeInBytes);

// User doesn’t deallocate memory!

As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations.

In the current release, only a device mempool can be bound.

`custatevecSetDeviceMemHandler`¶

custatevecStatus_t custatevecSetDeviceMemHandler(custatevecHandle_t handle, const custatevecDeviceMemHandler_t *handler)¶

Set the current device memory handler.

Once set, when cuStateVec needs device memory in various API calls it will allocate from the user-provided memory pool and deallocate at completion. See custatevecDeviceMemHandler_t and APIs that require extra workspace for further detail.

The internal stream order is established using the user-provided stream set via custatevecSetStream().

If handler argument is set to nullptr, the library handle will detach its existing memory handler.

Warning

It is undefined behavior for the following scenarios:

the library handle is bound to a memory handler and subsequently to another handler
the library handle outlives the attached memory pool
the memory pool is not stream-ordered

Parameters

handle – [in] Opaque handle holding cuStateVec’s library context.
handler – [in] the device memory handler that encapsulates the user’s mempool. The struct content is copied internally.

`custatevecGetDeviceMemHandler`¶

custatevecStatus_t custatevecGetDeviceMemHandler(custatevecHandle_t handle, custatevecDeviceMemHandler_t *handler)¶

Get the current device memory handler.

Parameters

handle – [in] Opaque handle holding cuStateVec’s library context.
handler – [out] If previously set, the struct pointed to by handler is filled in, otherwise CUSTATEVEC_STATUS_NO_DEVICE_ALLOCATOR is returned.

Initialization¶

cuStateVec API custatevecInitializeStateVector() can be used to initialize a state vector to any of a set of prescribed states. Please refer to custatevecStateVectorType_t for details.

Use case¶

// initialize state vector
custatevecInitializeStateVector(handle, sv, svDataType, nIndexBits, svType);

API reference¶

`custatevecInitializeStateVector`¶

custatevecStatus_t custatevecInitializeStateVector(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecStateVectorType_t svType)¶

Initialize the state vector to a certain form.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
svType – [in] the target quantum state

Gate Application¶

General Matrices¶

cuStateVec API custatevecApplyMatrix() can apply a matrix representing a gate to a state vector. The API may require external workspace for large matrices, and custatevecApplyMatrixGetWorkspaceSize() provides the size of workspace. If a device memory handler is set, custatevecApplyMatrixGetWorkspaceSize() can be skipped.

custatevecApplyMatrixBatchedGetWorkspaceSize() and custatevecApplyMatrixBatched() can apply matrices to batched state vectors. Please refer to batched state vectors for the overview of batched state vector simulations.

Use case¶

// check the size of external workspace
custatevecApplyMatrixGetWorkspaceSize(
    handle, svDataType, nIndexBits, matrix, matrixDataType, layout, adjoint, nTargets,
    nControls, computeType, &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// apply gate
custatevecApplyMatrix(
    handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout,
    adjoint, targets, nTargets, controls, controlBitValues, nControls,
    computeType, extraWorkspace, extraWorkspaceSizeInBytes);

API reference¶

`custatevecApplyMatrixGetWorkspaceSize`¶

custatevecStatus_t custatevecApplyMatrixGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nTargets, const uint32_t nControls, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶

This function gets the required workspace size for custatevecApplyMatrix().

This function returns the required extra workspace size to execute custatevecApplyMatrix(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required for a given set of arguments.

Parameters

handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] workspace size

`custatevecApplyMatrix`¶

custatevecStatus_t custatevecApplyMatrix(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Apply gate matrix.

Apply gate matrix to a state vector. The state vector size is \(2^\text{nIndexBits}\).

The matrix argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrix is ( \(2^\text{nTargets} \times 2^\text{nTargets}\) ) and the value type is specified by the matrixDataType argument. The layout argument specifies the matrix layout which can be in either row-major or column-major order. The targets and controls arguments specify target and control bit positions in the state vector index.

The controlBitValues argument specifies bit values of control bits. The ordering of controlBitValues is specified by the controls argument. If a null pointer is specified to this argument, all control bit values are set to 1.

By definition, bit positions in targets and controls arguments should not overlap.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nTargets. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyMatrixGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a square matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size

`custatevecApplyMatrixBatchedGetWorkspaceSize`¶

custatevecStatus_t custatevecApplyMatrixBatchedGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, custatevecMatrixMapType_t mapType, const int32_t *matrixIndices, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nMatrices, const uint32_t nTargets, const uint32_t nControls, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶

This function gets the required workspace size for custatevecApplyMatrixBatched().

This function returns the required extra workspace size to execute custatevecApplyMatrixBatched(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required for a given set of arguments.

Parameters

handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
mapType – [in] enumerator specifying the way to assign matrices
matrixIndices – [in] pointer to a host or device array of matrix indices
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nMatrices – [in] the number of matrices
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] workspace size

`custatevecApplyMatrixBatched`¶

custatevecStatus_t custatevecApplyMatrixBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, custatevecIndex_t svStride, custatevecMatrixMapType_t mapType, const int32_t *matrixIndices, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nMatrices, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

This function applies one gate matrix to each one of a set of batched state vectors.

This function applies one gate matrix for each of batched state vectors given by the batchedSv argument. Batched state vectors are allocated in single device memory chunk with the stride specified by the svStride argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by the nSVs argument.

The mapType argument specifies the way to assign matrices to the state vectors, and the matrixIndices argument specifies the matrix indices for the state vectors. When mapType is CUSTATEVEC_MATRIX_MAP_TYPE_MATRIX_INDEXED, the \(\text{matrixIndices[}i\text{]}\)-th matrix will be assigned to the \(i\)-th state vector. matrixIndices should contain nSVs integers when mapType is CUSTATEVEC_MATRIX_MAP_TYPE_MATRIX_INDEXED and it can be a null pointer when mapType is CUSTATEVEC_MATRIX_MAP_TYPE_BROADCAST.

The matrices argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrices is ( \(\text{nMatrices} \times 2^\text{nTargets} \times 2^\text{nTargets}\) ) and the value type is specified by the matrixDataType argument. The layout argument specifies the matrix layout which can be in either row-major or column-major order. The targets and controls arguments specify target and control bit positions in the state vector index. In this API, these bit positions are uniform for all the batched state vectors.

The controlBitValues argument specifies bit values of control bits. The ordering of controlBitValues is specified by the controls argument. If a null pointer is specified to this argument, all control bit values are set to 1.

By definition, bit positions in targets and controls arguments should not overlap.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nTargets. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyMatrixBatchedGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Note

In this version, this API does not return any errors even if the matrixIndices argument contains invalid matrix indices. However, when applicable an error message would be printed to stdout.

Parameters

handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vectors
nIndexBits – [in] the number of index bits of the state vectors
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
mapType – [in] enumerator specifying the way to assign matrices
matrixIndices – [in] pointer to a host or device array of matrix indices
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nMatrices – [in] the number of matrices
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size

Pauli Matrices¶

Exponential of a tensor product of Pauli matrices can be expressed as follows:

\[e^{i \theta \left( P_{target[0]} \otimes P_{target[1]} \otimes \cdots \otimes P_{target[nTargets-1]} \right)}.\]

Matrix \(P_{target[i]}\) can be either of Pauli matrices \(I\), \(X\), \(Y\), and \(Z\), which are corresponding to the custatevecPauli_t enums CUSTATEVEC_PAULI_I, CUSTATEVEC_PAULI_X, CUSTATEVEC_PAULI_Y, and CUSTATEVEC_PAULI_Z, respectively. Also refer to custatevecPauli_t for details.

Use case¶

// apply exponential
custatevecApplyPauliRotation(
    handle, sv, svDataType, nIndexBits, theta, paulis, targets, nTargets,
    controls, controlBitValues, nControls);

API reference¶

`custatevecApplyPauliRotation`¶

custatevecStatus_t custatevecApplyPauliRotation(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double theta, const custatevecPauli_t *paulis, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls)¶

Apply the exponential of a multi-qubit Pauli operator.

Apply exponential of a tensor product of Pauli bases specified by bases, \( e^{i \theta P} \), where \(P\) is the product of Pauli bases. The paulis, targets, and nTargets arguments specify Pauli bases and their bit positions in the state vector index.

At least one target and a corresponding Pauli basis should be specified.

The controls and nControls arguments specifies the control bit positions in the state vector index.

The controlBitValues argument specifies bit values of control bits. The ordering of controlBitValues is specified by the controls argument. If a null pointer is specified to this argument, all control bit values are set to 1.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of bits in the state vector index
theta – [in] theta
paulis – [in] host pointer to custatevecPauli_t array
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits

Generalized Permutation Matrices¶

A generalized permutation matrix can be expressed as the multiplication of a permutation matrix \(P\) and a diagonal matrix \(D\). For instance, we can decompose a 4 \(\times\) 4 generalized permutation matrix \(A\) as follows:

\[\begin{split}A = \left[ \begin{array}{cccc} 0 & 0 & a_0 & 0 \\ a_1 & 0 & 0 & 0 \\ 0 & 0 & 0 & a_2 \\ 0 & a_3 & 0 & 0 \end{array}\right] = DP\end{split}\]

, where

\[\begin{split}D = \left[ \begin{array}{cccc} a_0 & 0 & 0 & 0 \\ 0 & a_1 & 0 & 0 \\ 0 & 0 & a_2 & 0 \\ 0 & 0 & 0 & a_3 \end{array}\right], P = \left[ \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array}\right].\end{split}\]

When \(P\) is diagonal, the generalized permutation matrix is also diagonal. Similarly, when \(D\) is the identity matrix, the generalized permutation matrix becomes a permutation matrix.

The cuStateVec API custatevecApplyGeneralizedPermutationMatrix() applies a generalized permutation matrix like \(A\) to a state vector. The API may require extra workspace for large matrices, whose size can be queried using custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize(). If a device memory handler is set, custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize() can be skipped.

Use case¶

// check the size of external workspace
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize(
    handle, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F, targets,
    nTargets, nControls, &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// apply a generalized permutation matrix
custatevecApplyGeneralizedPermutationMatrix(
    handle, d_sv, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F,
    adjoint, targets, nTargets, controls, controlBitValues, nControls,
    extraWorkspace, extraWorkspaceSizeInBytes);

The operation is equivalent to the following:

// sv, sv_temp: the state vector and temporary buffer.

int64_t sv_size = int64_t{1} << nIndexBits;
for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++) {
    // The basis of sv_idx is converted to permutation basis to obtain perm_idx
    auto perm_idx = convertToPermutationBasis(sv_idx);
    // apply generalized permutation matrix
    if (adjoint == 0)
        sv_temp[sv_idx] = sv[permutation[perm_idx]] * diagonals[perm_idx];
    else
        sv_temp[permutation[perm_idx]] = sv[sv_idx] * conj(diagonals[perm_idx]);
}

for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++)
    sv[sv_idx] = sv_temp[sv_idx];

API reference¶

`custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize`¶

custatevecStatus_t custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const custatevecIndex_t *permutation, const void *diagonals, cudaDataType_t diagonalsDataType, const int32_t *targets, const uint32_t nTargets, const uint32_t nControls, size_t *extraWorkspaceSizeInBytes)¶

Get the extra workspace size required by custatevecApplyGeneralizedPermutationMatrix().

This function gets the size of extra workspace size required to execute custatevecApplyGeneralizedPermutationMatrix(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required for a given set of arguments.

Parameters

handle – [in] the handle to the cuStateVec library
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
permutation – [in] host or device pointer to a permutation table
diagonals – [in] host or device pointer to diagonal elements
diagonalsDataType – [in] data type of diagonals
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
extraWorkspaceSizeInBytes – [out] extra workspace size

`custatevecApplyGeneralizedPermutationMatrix`¶

custatevecStatus_t custatevecApplyGeneralizedPermutationMatrix(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecIndex_t *permutation, const void *diagonals, cudaDataType_t diagonalsDataType, const int32_t adjoint, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Apply generalized permutation matrix.

This function applies the generalized permutation matrix.

The generalized permutation matrix, \(A\), is expressed as \(A = DP\), where \(D\) and \(P\) are diagonal and permutation matrices, respectively.

The permutation matrix, \(P\), is specified as a permutation table which is an array of custatevecIndex_t and passed to the permutation argument.

The diagonal matrix, \(D\), is specified as an array of diagonal elements. The length of both arrays is \( 2^{{\text nTargets}} \). The diagonalsDataType argument specifies the type of diagonal elements.

Below is the table of combinations of svDataType and diagonalsDataType arguments available in this version.

`svDataType`	`diagonalsDataType`
CUDA_C_F64	CUDA_C_F64
CUDA_C_F32	CUDA_C_F64
CUDA_C_F32	CUDA_C_F32

This function can also be used to only apply either the diagonal or the permutation matrix. By passing a null pointer to the permutation argument, \(P\) is treated as an identity matrix, thus, only the diagonal matrix \(D\) is applied. Likewise, if a null pointer is passed to the diagonals argument, \(D\) is treated as an identity matrix, and only the permutation matrix \(P\) is applied.

The permutation argument should hold integers in [0, \( 2^{nTargets} \)). An integer should appear only once, otherwise the behavior of this function is undefined.

The permutation and diagonals arguments should not be null at the same time. In this case, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nTargets or nIndexBits. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize().

A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Note

In this version, custatevecApplyGeneralizedPermutationMatrix() does not return error if an invalid permutation argument is specified.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
permutation – [in] host or device pointer to a permutation table
diagonals – [in] host or device pointer to diagonal elements
diagonalsDataType – [in] data type of diagonals
adjoint – [in] apply adjoint of generalized permutation matrix
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size

Measurement¶

Measurement on Z-bases¶

Let us consider the measurement of an \(nIndexBits\)-qubit state vector \(sv\) on an \(nBasisBits\)-bit Z product basis \(basisBits\).

The sums of squared absolute values of state vector elements on the Z product basis, \(abs2sum0\) and \(abs2sum1\), are obtained by the followings:

\[\begin{split}abs2sum0 &= \Bra{sv} \left\{ \dfrac{1}{2} \left( 1 + Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}, \\ abs2sum1 &= \Bra{sv} \left\{ \dfrac{1}{2} \left( 1 - Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}.\end{split}\]

Therefore, probabilities to obtain parity 0 and 1 are expressed in the following expression:

\[\begin{split}Pr(parity = 0) &= \dfrac{abs2sum0}{abs2sum0 + abs2sum1}, \\ Pr(parity = 1) &= \dfrac{abs2sum1}{abs2sum0 + abs2sum1}.\end{split}\]

Depending on the measurement result, the state vector is collapsed. If parity is equal to 0, we obtain the following vector:

\[\ket{sv} = \dfrac{1}{\sqrt{norm}} \left\{ \dfrac{1}{2} \left( 1 + Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv},\]

and if parity is equal to 1, we obtain the following vector:

\[\ket{sv} = \dfrac{1}{\sqrt{norm}} \left\{ \dfrac{1}{2} \left( 1 - Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv},\]

where \(norm\) is the normalization factor.

Use case¶

We can measure by custatevecMeasureOnZBasis() as follows:

// measure on a Z basis
custatevecMeasureOnZBasis(
    handle, sv, svDataType, nIndexBits, &parity, basisBits, nBasisBits,
    randnum, collapse);

The operation is equivalent to the following:

// compute the sums of squared absolute values of state vector elements
// on a Z product basis
double abs2sum0, abs2sum1;
custatevecAbs2SumOnZBasis(
    handle, sv, svDataType, nIndexBits, &abs2sum0, &abs2sum1, basisBits,
    nBasisBits);

// [User] compute parity and norm
double abs2sum = abs2sum0 + abs2sum1;
int parity = (randnum * abs2sum < abs2sum0) ? 0 : 1;
double norm = (parity == 0) ? abs2sum0 : abs2sum1;

// collapse if necessary
switch (collapse) {
case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO:
    custatevecCollapseOnZBasis(
        handle, sv, svDataType, nIndexBits, parity, basisBits, nBasisBits,
        norm);
    break;  /* collapse */
case CUSTATEVEC_COLLAPSE_NONE:
    break;  /* Do nothing */

API reference¶

`custatevecAbs2SumOnZBasis`¶

custatevecStatus_t custatevecAbs2SumOnZBasis(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *abs2sum0, double *abs2sum1, const int32_t *basisBits, const uint32_t nBasisBits)¶

Calculates the sum of squared absolute values on a given Z product basis.

This function calculates sums of squared absolute values on a given Z product basis. If a null pointer is specified to abs2sum0 or abs2sum1, the sum for the corresponding value is not calculated. Since the sum of (abs2sum0 + abs2sum1) is identical to the norm of the state vector, one can calculate the probability where parity == 0 as (abs2sum0 / (abs2sum0 + abs2sum1)).

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
abs2sum0 – [out] pointer to a host or device variable to store the sum of squared absolute values for parity == 0
abs2sum1 – [out] pointer to a host or device variable to store the sum of squared absolute values for parity == 1
basisBits – [in] pointer to a host array of Z-basis index bits
nBasisBits – [in] the number of basisBits

`custatevecCollapseOnZBasis`¶

custatevecStatus_t custatevecCollapseOnZBasis(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int32_t parity, const int32_t *basisBits, const uint32_t nBasisBits, double norm)¶

Collapse state vector on a given Z product basis.

This function collapses state vector on a given Z product basis. The state elements that match the parity argument are scaled by a factor specified in the norm argument. Other elements are set to zero.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
parity – [in] parity, 0 or 1
basisBits – [in] pointer to a host array of Z-basis index bits
nBasisBits – [in] the number of Z basis bits
norm – [in] normalization factor

`custatevecMeasureOnZBasis`¶

custatevecStatus_t custatevecMeasureOnZBasis(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *parity, const int32_t *basisBits, const uint32_t nBasisBits, const double randnum, enum custatevecCollapseOp_t collapse)¶

Measurement on a given Z-product basis.

This function does measurement on a given Z product basis. The measurement result is the parity of the specified Z product basis. At least one basis bit should be specified, otherwise this function fails.

If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measurement result without collapsing the state vector. If CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseOnZBasis() does.

If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
parity – [out] parity, 0 or 1
basisBits – [in] pointer to a host array of Z basis bits
nBasisBits – [in] the number of Z basis bits
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation

Qubit Measurement¶

Assume that we measure an \(nIndexBits\)-qubits state vector \(sv\) with a \(bitOrderingLen\)-bits bit string \(bitOrdering\).

The sums of squared absolute values of state vector elements are obtained by the following:

\[abs2sum[idx] = \braket{sv|i}\braket{i|sv},\]

where \(idx = b_{BitOrderingLen-1}\cdots b_1 b_0\), \(i = b_{bitOrdering[BitOrderingLen-1]} \cdots b_{bitOrdering[1]} b_{bitOrdering[0]}\), \(b_p \in \{0, 1\}\).

Therefore, probability to obtain the \(idx\)-th pattern of bits are expressed in the following expression:

\[Pr(idx) = \dfrac{abs2sum[idx]}{\sum_{k}abs2sum[k]}.\]

Depending on the measurement result, the state vector is collapsed.

If \(idx\) satisfies \((idx \ \& \ bitString) = idx\), we obtain \(sv[idx] = \dfrac{1}{\sqrt{norm}} sv[idx]\). Otherwise, \(sv[idx] = 0\), where \(norm\) is the normalization factor.

Use case¶

We can measure by custatevecBatchMeasure() as follows:

// measure with a bit string
custatevecBatchMeasure(
    handle, sv, svDataType, nIndexBits, bitString, bitOrdering, bitStringLen,
    randnum, collapse);

The operation is equivalent to the following:

// compute the sums of squared absolute values of state vector elements
int maskLen = 0;
int* maskBitString = nullptr;
int* maskOrdering = nullptr;

custatevecAbs2SumArray(
    handle, sv, svDataType, nIndexBits, abs2Sum, bitOrdering, bitOrderingLen,
    maskBitString, maskOrdering, maskLen);

// [User] compute a cumulative sum and choose bitString by a random number

// collapse if necessary
switch (collapse) {
case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO:
    custatevecCollapseByBitString(
        handle, sv, svDataType, nIndexBits, bitString, bitOrdering,
        bitStringLen, norm);
    break;  /* collapse */
case CUSTATEVEC_COLLAPSE_NONE:
    break;  /* Do nothing */

For batched state vectors, custatevecAbs2SumArrayBatched(), custatevecCollapseByBitStringBatched(), and custatevecMeasureBatched() are available. Please refer to batched state vectors for the overview of batched state vector simulations.

For multi-GPU computations, custatevecBatchMeasureWithOffset() is available. This function works on one device, and users are required to compute the cumulative array of squared absolute values of state vector elements beforehand.

// The state vector is divided to nSubSvs sub state vectors.
// each state vector has its own ordinal and nLocalBits index bits.
// The ordinals of sub state vectors correspond to the extended index bits.
// In this example, all the local qubits are measured and collapsed.

// get abs2sum for each sub state vector
double abs2SumArray[nSubSvs];
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecAbs2SumArray(
        handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr,
        0, nullptr, nullptr, 0);
}

for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    cudaDeviceSynchronize();
}

// get cumulative array
double cumulativeArray[nSubSvs + 1];
cumulativeArray[0] = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cumulativeArray[iSv + 1] = cumulativeArray[iSv] + abs2SumArray[iSv];
}

// measurement
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    // detect which sub state vector will be used for measurement.
    if (cumulativeArray[iSv] <= randnum && randnum < cumulativeArray[iSv + 1]) {
        double norm = cumulativeArray[nSubSvs];
        double offset = cumulativeArray[iSv];
        cudaSetDevice(devices[iSv]);
        // measure local qubits. Here the state vector will not be collapsed.
        // Only local qubits can be included in bitOrdering and bitString arguments.
        // That is, bitOrdering = {0, 1, 2, ..., nLocalBits - 1} and
        // bitString will store values of local qubits as an array of integers.
        custatevecBatchMeasureWithOffset(
            handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering,
            bitStringLen, randnum, CUSTATEVEC_COLLAPSE_NONE, offset, norm);
    }
}

for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    cudaDeviceSynchronize();
}

// get abs2Sum after collapse
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecAbs2SumArray(
        handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr,
        0, bitString, bitOrdering, bitStringLen);
}

for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    cudaDeviceSynchronize();
}

// get norm after collapse
double norm = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    norm += abs2SumArray[iSv];
}

// collapse sub state vectors
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecCollapseByBitString(
        handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering,
        bitStringLen, norm);
}

// destroy handle
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecDestroy(handle[iSv]);
}

Please refer to NVIDIA/cuQuantum repository for further detail.

API reference¶

`custatevecAbs2SumArray`¶

custatevecStatus_t custatevecAbs2SumArray(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *abs2sum, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen)¶

Calculate abs2sum array for a given set of index bits.

Calculates an array of sums of squared absolute values of state vector elements. The abs2sum array can be on host or device. The index bit ordering abs2sum array is specified by the bitOrdering and bitOrderingLen arguments. Unspecified bits are folded (summed up).

The maskBitString, maskOrdering and maskLen arguments set bit mask in the state vector index. The abs2sum array is calculated by using state vector elements whose indices match the mask bit string. If the maskLen argument is 0, null pointers can be specified to the maskBitString and maskOrdering arguments, and all state vector elements are used for calculation.

By definition, bit positions in bitOrdering and maskOrdering arguments should not overlap.

The empty bitOrdering can be specified to calculate the norm of state vector. In this case, 0 is passed to the bitOrderingLen argument and the bitOrdering argument can be a null pointer.

Note

Since the size of abs2sum array is proportional to \( 2^{bitOrderingLen} \) , the max length of bitOrdering depends on the amount of available memory and maskLen.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
abs2sum – [out] pointer to a host or device array of sums of squared absolute values
bitOrdering – [in] pointer to a host array of index bit ordering
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array for a bit string to specify mask
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask

`custatevecCollapseByBitString`¶

custatevecStatus_t custatevecCollapseByBitString(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, double norm)¶

Collapse state vector to the state specified by a given bit string.

This function collapses state vector to the state specified by a given bit string. The state vector elements specified by the bitString, bitOrdering and bitStringLen arguments are normalized by the norm argument. Other elements are set to zero.

At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
bitString – [in] pointer to a host array of bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bit string
norm – [in] normalization constant

`custatevecBatchMeasure`¶

custatevecStatus_t custatevecBatchMeasure(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, const double randnum, enum custatevecCollapseOp_t collapse)¶

Batched single qubit measurement.

This function does batched single qubit measurement and returns a bit string. The bitOrdering argument specifies index bits to be measured. The measurement result is stored in bitString in the ordering specified by the bitOrdering argument.

If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measured bit string without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseByBitString() does.

If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Note

This API is for measuring a single state vector. For measuring batched state vectors, please use custatevecMeasureBatched(), whose arguments are passed in a different convention.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
bitString – [out] pointer to a host array of measured bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation

`custatevecAbs2SumArrayBatched`¶

custatevecStatus_t custatevecAbs2SumArrayBatched(custatevecHandle_t handle, const void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, double *abs2sumArrays, const custatevecIndex_t abs2sumArrayStride, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const custatevecIndex_t *maskBitStrings, const int32_t *maskOrdering, const uint32_t maskLen)¶

Calculate batched abs2sum array for a given set of index bits.

The batched version of custatevecAbs2SumArray() that calculates a batch of arrays that holds sums of squared absolute values from batched state vectors.

State vectors are placed on a single contiguous device memory chunk. The svStride argument specifies the distance between two adjacent state vectors. Thus, svStride should be equal to or larger than the state vector size.

The computed sums of squared absolute values are output to the abs2sumArrays which is a contiguous memory chunk. The abs2sumArrayStride specifies the distance between adjacent two abs2sum arrays. The batched abs2sum arrays can be on host or device. The index bit ordering the abs2sum array in the batch is specified by the bitOrdering and bitOrderingLen arguments. Unspecified bits are folded (summed up).

The maskBitStrings, maskOrdering and maskLen arguments specify bit mask to for the index bits of batched state vectors. The abs2sum array is calculated by using state vector elements whose indices match the specified mask bit strings. The maskBitStrings argument specifies an array of mask values as integer bit masks that are applied for the state vector index.

If the maskLen argument is 0, null pointers can be specified to the maskBitStrings and maskOrdering arguments. In this case, all state vector elements are used without masks to compute the squared sum of absolute values.

By definition, bit positions in bitOrdering and maskOrdering arguments should not overlap.

The empty bitOrdering can be specified to calculate the norm of state vector. In this case, 0 is passed to the bitOrderingLen argument and the bitOrdering argument can be a null pointer.

Note

In this version, this API does not return any errors even if the maskBitStrings argument contains invalid bit strings. However, when applicable an error message would be printed to stdout.

Parameters

handle – [in] the handle to the cuStateVec library
batchedSv – [in] batch of state vectors
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
nSVs – [in] the number of state vectors in a batch
svStride – [in] the stride of state vector
abs2sumArrays – [out] pointer to a host or device array of sums of squared absolute values
abs2sumArrayStride – [in] the distance between consequence abs2sumArrays
bitOrdering – [in] pointer to a host array of index bit ordering
bitOrderingLen – [in] the length of bitOrdering
maskBitStrings – [in] pointer to a host or device array of mask bit strings
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask

`custatevecCollapseByBitStringBatchedGetWorkspaceSize`¶

custatevecStatus_t custatevecCollapseByBitStringBatchedGetWorkspaceSize(custatevecHandle_t handle, const uint32_t nSVs, const custatevecIndex_t *bitStrings, const double *norms, size_t *extraWorkspaceSizeInBytes)¶

This function gets the required workspace size for custatevecCollapseByBitStringBatched().

This function returns the required extra workspace size to execute custatevecCollapseByBitStringBatched(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required.

Note

The bitStrings and norms arrays are of the same size nSVs and can reside on either the host or the device, but their locations must remain the same when invoking custatevecCollapseByBitStringBatched(), or the computed workspace size may become invalid and lead to undefined behavior.

Parameters

handle – [in] the handle to the cuStateVec context
nSVs – [in] the number of batched state vectors
bitStrings – [in] pointer to an array of bit strings, on either host or device
norms – [in] pointer to an array of normalization constants, on either host or device
extraWorkspaceSizeInBytes – [out] workspace size

`custatevecCollapseByBitStringBatched`¶

custatevecStatus_t custatevecCollapseByBitStringBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, const custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *norms, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Collapse the batched state vectors to the state specified by a given bit string.

This function collapses all of the state vectors in a batch to the state specified by a given bit string. Batched state vectors are allocated in single device memory chunk with the stride specified by the svStride argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by the nSVs argument.

The i-th state vector’s elements, as specified by the i-th bitStrings element and the bitOrdering and bitStringLen arguments, are normalized by the i-th norms element. Other state vector elements are set to zero.

At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Note that bitOrdering and bitStringLen are applicable to all state vectors in the batch, while the bitStrings and norms arrays are of the same size nSVs and can reside on either the host or the device.

The bitStrings argument should hold integers in [0, \( 2^\text{bitStringLen} \)).

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nSVs and/or nIndexBits. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecCollapseByBitStringBatchedGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Note

In this version, custatevecCollapseByBitStringBatched() does not return error if an invalid bitStrings or norms argument is specified. However, when applicable an error message would be printed to stdout.

Note

Unlike the non-batched version (custatevecCollapseByBitString()), in this batched version bitStrings are stored as an array with element type custatevecIndex_t; that is, each element is an integer representing a bit string in the binary form. This usage is in line with the custatevecSamplerSample() API. See the Bit Ordering section for further detail.

The bitStrings and norms arrays are of the same size nSVs and can reside on either the host or the device, but their locations must remain the same when invoking custatevecCollapseByBitStringBatchedGetWorkspaceSize(), or the computed workspace size may become invalid and lead to undefined behavior.

Parameters

handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vectors
nIndexBits – [in] the number of index bits of the state vectors
nSVs – [in] the number of batched state vectors
svStride – [in] distance of two consecutive state vectors
bitStrings – [in] pointer to an array of bit strings, on either host or device
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bit string
norms – [in] pointer to an array of normalization constants on either host or device
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] size of the extra workspace

`custatevecMeasureBatched`¶

custatevecStatus_t custatevecMeasureBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *randnums, enum custatevecCollapseOp_t collapse)¶

Single qubit measurements for batched state vectors.

This function measures bit strings of batched state vectors. The bitOrdering and bitStringLen arguments specify an integer array of index bit positions to be measured. The measurement results are returned to bitStrings which is a 64-bit integer array of 64-bit integer bit masks.

Ex. When bitOrdering = {3, 1} is specified, this function measures two index bits. The 0-th bit in bitStrings elements represents the measurement outcomes of the index bit 3, and the 1st bit represents those of the 1st index bit.

Batched state vectors are given in a single contiguous memory chunk where state vectors are placed at the distance specified by svStride. The svStride is expressed in the number of elements.

The randnums stores random numbers used for measurements. The number of random numbers is identical to nSVs, and values should be in [0, 1). Any random number not in this range, the value is clipped to [0, 1).

If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measured bit strings without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses state vectors. After collapse of state vectors, the norms of all state vectors will be 1.

Note

This API is for measuring batched state vectors. For measuring a single state vector, custatevecBatchMeasure() is also available, whose arguments are passed in a different convention.

Parameters

handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vectors
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
nSVs – [in] the number of state vectors in the batched state vector
svStride – [in] the distance between state vectors in the batch
bitStrings – [out] pointer to a host or device array of measured bit strings
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnums – [in] pointer to a host or device array of random numbers.
collapse – [in] Collapse operation

`custatevecBatchMeasureWithOffset`¶

custatevecStatus_t custatevecBatchMeasureWithOffset(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, const double randnum, enum custatevecCollapseOp_t collapse, const double offset, const double abs2sum)¶

Batched single qubit measurement for partial vector.

This function does batched single qubit measurement and returns a bit string. The bitOrdering argument specifies index bits to be measured. The measurement result is stored in bitString in the ordering specified by the bitOrdering argument.

If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measured bit string without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseByBitString() does.

This function assumes that sv is partial state vector and drops some most significant bits. Prefix sums for lower indices and the entire state vector must be provided as offset and abs2sum, respectively. When offset == abs2sum == 0, this function behaves in the same way as custatevecBatchMeasure().

If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] partial state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
bitString – [out] pointer to a host array of measured bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation
offset – [in] partial sum of squared absolute values
abs2sum – [in] sum of squared absolute values for the entire state vector

Expectation¶

Expectation via a Matrix¶

Expectation performs the following operation:

\[\langle A \rangle = \bra{\phi}A\ket{\phi},\]

where \(\ket{\phi}\) is a state vector and \(A\) is a matrix or an observer. The API for expectation custatevecComputeExpectation() may require external workspace for large matrices, and custatevecComputeExpectationGetWorkspaceSize() provides the size of workspace. If a device memory handler is set, custatevecComputeExpectationGetWorkspaceSize() can be skipped.

custatevecComputeExpectationBatchedGetWorkspaceSize() and custatevecComputeExpectationBatched() can compute expectation values for batched state vectors. Please refer to batched state vectors for the overview of batched state vector simulations.

Use case¶

// check the size of external workspace
custatevecComputeExpectationGetWorkspaceSize(
    handle, svDataType, nIndexBits, matrix, matrixDataType, layout, nBasisBits, computeType,
    &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// perform expectation
custatevecComputeExpectation(
    handle, sv, svDataType, nIndexBits, expect, expectDataType, residualNorm,
    matrix, matrixDataType, layout, basisBits, nBasisBits, computeType,
    extraWorkspace, extraWorkspaceSizeInBytes);

API reference¶

`custatevecComputeExpectationGetWorkspaceSize`¶

custatevecStatus_t custatevecComputeExpectationGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nBasisBits, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶

This function gets the required workspace size for custatevecComputeExpectation().

This function returns the size of the extra workspace required to execute custatevecComputeExpectation(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required.

Parameters

handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nBasisBits – [in] the number of target bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] size of the extra workspace

`custatevecComputeExpectation`¶

custatevecStatus_t custatevecComputeExpectation(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, void *expectationValue, cudaDataType_t expectationDataType, double *residualNorm, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t *basisBits, const uint32_t nBasisBits, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Compute expectation of matrix observable.

This function calculates expectation for a given matrix observable. The acceptable values for the expectationDataType argument are CUDA_R_64F and CUDA_C_64F.

The basisBits and nBasisBits arguments specify the basis to calculate expectation. For the computeType argument, the same combinations for custatevecApplyMatrix() are available.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nBasisBits. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecComputeExpectationGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Note

The residualNorm argument is not available in this version. If a matrix given by the matrix argument may not be a Hermitian matrix, please specify CUDA_C_64F to the expectationDataType argument and check the imaginary part of the calculated expectation value.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
expectationValue – [out] host pointer to a variable to store an expectation value
expectationDataType – [in] data type of expect
residualNorm – [out] result of matrix type test
matrix – [in] observable as matrix
matrixDataType – [in] data type of matrix
layout – [in] matrix memory layout
basisBits – [in] pointer to a host array of basis index bits
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] pointer to an extra workspace
extraWorkspaceSizeInBytes – [in] the size of extra workspace

Note

This function might be asynchronous with respect to host depending on the arguments. Please use cudaStreamSynchronize (for synchronization) or cudaStreamWaitEvent (for establishing the stream order) with the stream set to the handle of the current device before using the results stored in expectationValue.

`custatevecComputeExpectationBatchedGetWorkspaceSize`¶

custatevecStatus_t custatevecComputeExpectationBatchedGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nMatrices, const uint32_t nBasisBits, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶

This function gets the required workspace size for custatevecComputeExpectationBatched().

This function returns the size of the extra workspace required to execute custatevecComputeExpectationBatched(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required.

Parameters

handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] enumerator specifying the memory layout of matrix
nMatrices – [in] the number of matrices
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] size of the extra workspace

`custatevecComputeExpectationBatched`¶

custatevecStatus_t custatevecComputeExpectationBatched(custatevecHandle_t handle, const void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, custatevecIndex_t svStride, double2 *expectationValues, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nMatrices, const int32_t *basisBits, const uint32_t nBasisBits, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Compute the expectation values of matrix observables for each of the batched state vectors.

This function computes expectation values for given matrix observables to each one of batched state vectors given by the batchedSv argument. Batched state vectors are allocated in single device memory chunk with the stride specified by the svStride argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by the nSVs argument.

The expectationValues argument points to single memory chunk to output the expectation values. This API returns values in double precision (complex128) regardless of input data types. The output array size is ( \(\text{nMatrices} \times \text{nSVs}\) ) and its leading dimension is nMatrices.

The matrices argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrices is ( \(\text{nMatrices} \times 2^\text{nBasisBits} \times 2^\text{nBasisBits}\) ) and the value type is specified by the matrixDataType argument. The layout argument specifies the matrix layout which can be in either row-major or column-major order.

The basisBits and nBasisBits arguments specify the basis to calculate expectation. For the computeType argument, the same combinations for custatevecComputeExpectation() are available.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nBasisBits. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecComputeExpectationBatchedGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Parameters

handle – [in] the handle to the cuStateVec library
batchedSv – [in] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
expectationValues – [out] pointer to a host array to store expectation values
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] matrix memory layout
nMatrices – [in] the number of matrices
basisBits – [in] pointer to a host array of basis index bits
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] pointer to an extra workspace
extraWorkspaceSizeInBytes – [in] the size of extra workspace

Expectation on Pauli Basis¶

cuStateVec API custatevecComputeExpectationsOnPauliBasis() computes expectation values for a batch of Pauli strings. Each observable can be expressed as follows:

\[P_{\text{basisBits}[0]} \otimes P_{\text{basisBits}[1]} \otimes \cdots \otimes P_{\text{basisBits}[\text{nBasisBits}-1]}.\]

Each matrix \(P_{\text{basisBits}[i]}\) can be one of the Pauli matrices \(I\), \(X\), \(Y\), and \(Z\), corresponding to the custatevecPauli_t enums CUSTATEVEC_PAULI_I, CUSTATEVEC_PAULI_X, CUSTATEVEC_PAULI_Y, and CUSTATEVEC_PAULI_Z, respectively. Also refer to custatevecPauli_t for details.

Use case¶

// calculate the norm and the expectations for Z(q1) and X(q0)Y(q2)

uint32_t nPauliOperatorArrays = 3;
custatevecPauli_t pauliOperators0[] = {};                                       // III
int32_t           basisBits0[]      = {};
custatevecPauli_t pauliOperators1[] = {CUSTATEVEC_PAULI_Z};                     // IZI
int32_t           basisBits1[]      = {1};
custatevecPauli_t pauliOperators2[] = {CUSTATEVEC_PAULI_X, CUSTATEVEC_PAULI_Y}; // XIY
int32_t           basisBits2[]      = {0, 2};
const uint32_t nBasisBitsArray[] = {0, 1, 2};

const custatevecPauli_t*
  pauliOperatorsArray[] = {pauliOperators0, pauliOperators1, pauliOperators2};
const int32_t *basisBitsArray[] = { basisBits0, basisBits1, basisBits2};

uint32_t nIndexBits = 3;
double expectationValues[nPauliOperatorArrays];

custatevecComputeExpectationsOnPauliBasis(
    handle, sv, svDataType, nIndexBits, expectationValues,
    pauliOperatorsArray, nPauliOperatorArrays,
    basisBitsArray, nBasisBitsArray);

API reference¶

`custatevecComputeExpectationsOnPauliBasis`¶

custatevecStatus_t custatevecComputeExpectationsOnPauliBasis(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *expectationValues, const custatevecPauli_t **pauliOperatorsArray, const uint32_t nPauliOperatorArrays, const int32_t **basisBitsArray, const uint32_t *nBasisBitsArray)¶

Calculate expectation values for a batch of (multi-qubit) Pauli operators.

This function calculates multiple expectation values for given sequences of Pauli operators by a single call.

A single Pauli operator sequence, pauliOperators, is represented by using an array of custatevecPauli_t. The basis bits on which these Pauli operators are acting are represented by an array of index bit positions. If no Pauli operator is specified for an index bit, the identity operator (CUSTATEVEC_PAULI_I) is implicitly assumed.

The length of pauliOperators and basisBits are the same and specified by nBasisBits.

The number of Pauli operator sequences is specified by the nPauliOperatorArrays argument.

Multiple sequences of Pauli operators are represented in the form of arrays of arrays in the following manners:

The pauliOperatorsArray argument is an array for arrays of custatevecPauli_t.
The basisBitsArray is an array of the arrays of basis bit positions.
The nBasisBitsArray argument holds an array of the length of Pauli operator sequences and basis bit arrays.

Calculated expectation values are stored in a host buffer specified by the expectationValues argument of length nPauliOpeartorsArrays.

This function returns CUSTATEVEC_STATUS_INVALID_VALUE if basis bits specified for a Pauli operator sequence has duplicates and/or out of the range of [0, nIndexBits).

This function accepts empty Pauli operator sequence to get the norm of the state vector.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
expectationValues – [out] pointer to a host array to store expectation values
pauliOperatorsArray – [in] pointer to a host array of Pauli operator arrays
nPauliOperatorArrays – [in] the number of Pauli operator arrays
basisBitsArray – [in] host array of basis bit arrays
nBasisBitsArray – [in] host array of the number of basis bits

Matrix property testing¶

The API custatevecTestMatrixType() is available to check the properties of matrices.

If a matrix \(A\) is unitary, \(AA^{\dagger} = A^{\dagger}A = I\), where \(A^{\dagger}\) is the conjugate transpose of \(A\) and \(I\) is the identity matrix, respectively.

When CUSTATEVEC_MATRIX_TYPE_UNITARY is given for its argument, this API computes the 1-norm \(||R||_1 = \sum{|r_{ij}|}\), where \(R = AA^{\dagger} - I\). This value will be approximately zero if \(A\) is unitary.

If a matrix \(A\) is Hermitian, \(A^{\dagger} = A\).

When CUSTATEVEC_MATRIX_TYPE_HERMITIAN is given for its argument, this API computes the 2-norm \(||R||_2 = \sum{|r_{ij}|^2}\), where \(R = (A - A^{\dagger}) / 2\). This value will be approximately zero if \(A\) is Hermitian.

The API may require external workspace for large matrices, and custatevecTestMatrixTypeGetWorkspaceSize() provides the size of workspace. If a device memory handler is set, it is not necessary to provide explicit workspace by users.

Use case¶

double residualNorm;

void* extraWorkspace = nullptr;
size_t extraWorkspaceSizeInBytes = 0;

// check the size of external workspace
custatevecTestMatrixTypeGetWorkspaceSize(
    handle, matrixType, matrix, matrixDataType, layout,
    nTargets, adjoint, computeType, &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// execute testing
custatevecTestMatrixType(
    handle, &residualNorm, matrixType, matrix, matrixDataType, layout,
    nTargets, adjoint, computeType, extraWorkspace, extraWorkspaceSizeInBytes);

API reference¶

`custatevecTestMatrixTypeGetWorkspaceSize`¶

custatevecStatus_t custatevecTestMatrixTypeGetWorkspaceSize(custatevecHandle_t handle, custatevecMatrixType_t matrixType, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nTargets, const int32_t adjoint, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶

Get extra workspace size for custatevecTestMatrixType()

This function gets the size of an extra workspace required to execute custatevecTestMatrixType(). extraWorkspaceSizeInBytes will be set to 0 if no extra buffer is required.

Parameters

handle – [in] the handle to cuStateVec library
matrixType – [in] matrix type
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nTargets – [in] the number of target bits, up to 15
adjoint – [in] flag to control whether the adjoint of matrix is tested
computeType – [in] compute type
extraWorkspaceSizeInBytes – [out] workspace size

`custatevecTestMatrixType`¶

custatevecStatus_t custatevecTestMatrixType(custatevecHandle_t handle, double *residualNorm, custatevecMatrixType_t matrixType, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nTargets, const int32_t adjoint, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Test the deviation of a given matrix from a Hermitian (or Unitary) matrix.

This function tests if the type of a given matrix matches the type given by the matrixType argument.

For tests for the unitary type, \( R = (AA^{\dagger} - I) \) is calculated where \( A \) is the given matrix. The sum of absolute values of \( R \) matrix elements is returned.

For tests for the Hermitian type, \( R = (M - M^{\dagger}) / 2 \) is calculated. The sum of squared absolute values of \( R \) matrix elements is returned.

This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large nTargets. In such cases, the extraWorkspace and extraWorkspaceSizeInBytes arguments should be specified to provide extra workspace. The required size of an extra workspace is obtained by calling custatevecTestMatrixTypeGetWorkspaceSize(). A null pointer can be passed to the extraWorkspace argument if no extra workspace is required. Also, if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Note

The nTargets argument must be no more than 15 in this version. For larger nTargets, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Parameters

handle – [in] the handle to cuStateVec library
residualNorm – [out] host pointer, to store the deviation from certain matrix type
matrixType – [in] matrix type
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nTargets – [in] the number of target bits, up to 15
adjoint – [in] flag to control whether the adjoint of matrix is tested
computeType – [in] compute type
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size

Sampling¶

Sampling enables to obtain measurement results many times by using probability calculated from quantum states.

Use case¶

// create sampler and check the size of external workspace
custatevecSamplerCreate(
    handle, sv, svDataType, nIndexBits, &sampler, nMaxShots,
    &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
void extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(extraWorkspace, extraWorkspaceSizeInBytes);

// calculate cumulative abs2sum
custatevecSamplerPreprocess(
    handle, sampler, extraWorkspace, extraWorkspaceSizeInBytes);

// [User] generate randnums, array of random numbers [0, 1) for sampling
...

// sample bit strings
custatevecSamplerSample(
    handle, sampler, bitStrings, bitOrdering, bitStringLen, randnums, nShots,
    output);

// deallocate the sampler
custatevecSamplerDestroy(sampler);

For multi-GPU computations, cuStateVec provides custatevecSamplerGetSquaredNorm() and custatevecSamplerApplySubSVOffset(). Users are required to calculate cumulative abs2sum array with the squared norm of each sub state vector via custatevecSamplerGetSquaredNorm() and provide its values to the sampler descriptor via custatevecSamplerApplySubSVOffset().

// The state vector is divided to nSubSvs sub state vectors.
// each state vector has its own ordinal and nLocalBits index bits.
// The ordinals of sub state vectors correspond to the extended index bits.

// create sampler and check the size of external workspace
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecSamplerCreate(
        handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &sampler[iSv], nMaxShots,
        &extraWorkspaceSizeInBytes[iSv]);
}

// allocate external workspace if necessary
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    if (extraWorkspaceSizeInBytes[iSv] > 0) {
        cudaSetDevice(devices[iSv]);
        cudaMalloc(&extraWorkspace[iSv], extraWorkspaceSizeInBytes[iSv]);
    }
}

// sample preprocess
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]) );
    custatevecSampler_preprocess(
        handle[iSv], sampler[iSv], extraWorkspace[iSv],
        extraWorkspaceSizeInBytes[iSv]);
}

// get norm of the sub state vectors
double subNorms[nSubSvs];
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecSamplerGetSquaredNorm(
        handle[iSv], sampler[iSv], &subNorms[iSv]);
}

for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    cudaDeviceSynchronize();
}

// get cumulative array
double cumulativeArray[nSubSvs + 1];
cumulativeArray[0] = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cumulativeArray[iSv + 1] = cumulativeArray[iSv] + subNorms[iSv];
}
double norm = cumulativeArray[nSubSvs];

// apply offset and norm
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]) );
    custatevecSamplerApplySubSVOffset(
        handle[iSv], sampler[iSv], iSv, nSubSvs, cumulativeArray[iSv], norm);
}

// divide randnum array. randnums must be sorted in the ascending order.
int shotOffsets[nSubSvs + 1];
shotOffsets[0] = 0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    double* pos = std::lower_bound(randnums, randnums + nShots,
                                    cumulativeArray[iSv + 1] / norm);
    if (iSv == nSubSvs - 1) {
        pos = randnums + nShots;
    }
    shotOffsets[iSv + 1] = pos - randnums;
}

// sample bit strings
for (int iSv = 0; iSv < nSubSvs; iSv++) {
    int shotOffset = shotOffsets[iSv];
    int nSubShots = shotOffsets[iSv + 1] - shotOffsets[iSv];
    if (nSubShots > 0) {
        cudaSetDevice(devices[iSv]);
        custatevecSamplerSample(
            handle[iSv], sampler[iSv], &bitStrings[shotOffset], bitOrdering,
            bitStringLen, &randnums[shotOffset], nSubShots,
            CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER);
    }
}

for (int iSv = 0; iSv < nSubSvs; iSv++) {
    cudaSetDevice(devices[iSv]);
    custatevecSamplerDestroy(sampler[iSv]);
}

Please refer to NVIDIA/cuQuantum repository for further detail.

API reference¶

`custatevecSamplerCreate`¶

custatevecStatus_t custatevecSamplerCreate(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecSamplerDescriptor_t *sampler, uint32_t nMaxShots, size_t *extraWorkspaceSizeInBytes)¶

Create sampler descriptor.

This function creates a sampler descriptor. If an extra workspace is required, its size is set to extraWorkspaceSizeInBytes.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] pointer to state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits of the state vector
sampler – [out] pointer to a new sampler descriptor
nMaxShots – [in] the max number of shots used for this sampler context
extraWorkspaceSizeInBytes – [out] workspace size

Note

The max value of nMaxShots is \(2^{31} - 1\). If the value exceeds the limit, custatevecSamplerCreate() returns CUSTATEVEC_STATUS_INVALID_VALUE.

`custatevecSamplerPreprocess`¶

custatevecStatus_t custatevecSamplerPreprocess(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, void *extraWorkspace, const size_t extraWorkspaceSizeInBytes)¶

Preprocess the state vector for preparation of sampling.

This function prepares internal states of the sampler descriptor. If a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0. Otherwise, a pointer passed to the extraWorkspace argument is associated to the sampler handle and should be kept during its life time. The size of extraWorkspace is obtained when custatevecSamplerCreate() is called.

Parameters

handle – [in] the handle to the cuStateVec library
sampler – [inout] the sampler descriptor
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] size of the extra workspace

`custatevecSamplerGetSquaredNorm`¶

custatevecStatus_t custatevecSamplerGetSquaredNorm(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, double *norm)¶

Get the squared norm of the state vector.

This function returns the squared norm of the state vector. An intended use case is sampling with multiple devices. This API should be called after custatevecSamplerPreprocess(). Otherwise, the behavior of this function is undefined.

Parameters

handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
norm – [out] the norm of the state vector

`custatevecSamplerApplySubSVOffset`¶

custatevecStatus_t custatevecSamplerApplySubSVOffset(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, int32_t subSVOrd, uint32_t nSubSVs, double offset, double norm)¶

Apply the partial norm and norm to the state vector to the sample descriptor.

This function applies offsets assuming the given state vector is a sub state vector. An intended use case is sampling with distributed state vectors. The nSubSVs argument should be a power of 2 and subSVOrd should be less than nSubSVs. Otherwise, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

Parameters

handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
subSVOrd – [in] sub state vector ordinal
nSubSVs – [in] the number of sub state vectors
offset – [in] cumulative sum offset for the sub state vector
norm – [in] norm for all sub vectors

`custatevecSamplerSample`¶

custatevecStatus_t custatevecSamplerSample(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *randnums, const uint32_t nShots, enum custatevecSamplerOutput_t output)¶

Sample bit strings from the state vector.

This function does sampling. The bitOrdering and bitStringLen arguments specify bits to be sampled. Sampled bit strings are represented as an array of custatevecIndex_t and are stored to the host memory buffer that the bitStrings argument points to.

The randnums argument is an array of user-generated random numbers whose length is nShots. The range of random numbers should be in [0, 1). A random number given by the randnums argument is clipped to [0, 1) if its range is not in [0, 1).

The output argument specifies the order of sampled bit strings:

If CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER is specified, the order of sampled bit strings is the same as that in the randnums argument.
If CUSTATEVEC_SAMPLER_OUTPUT_ASCENDING_ORDER is specified, bit strings are returned in the ascending order.

If you don’t need a particular order, choose CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER by default. (It may offer slightly better performance.)

This API should be called after custatevecSamplerPreprocess(). Otherwise, the behavior of this function is undefined. By calling custatevecSamplerApplySubSVOffset() prior to this function, it is possible to sample bits corresponding to the ordinal of sub state vector.

Parameters

handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
bitStrings – [out] pointer to a host array to store sampled bit strings
bitOrdering – [in] pointer to a host array of bit ordering for sampling
bitStringLen – [in] the number of bits in bitOrdering
randnums – [in] pointer to an array of random numbers
nShots – [in] the number of shots
output – [in] the order of sampled bit strings

`custatevecSamplerDestroy`¶

custatevecStatus_t custatevecSamplerDestroy(custatevecSamplerDescriptor_t sampler)¶

This function releases resources used by the sampler.

Parameters: sampler – [in] the sampler descriptor

Accessor¶

An accessor extracts or updates state vector segments.

The APIs custatevecAccessorCreate() and custatevecAccessorCreateView() initialize an accessor and also return the size of an extra workspace (if needed by the APIs custatevecAccessorGet() and custatevecAccessorSet() to perform the copy). The workspace must be bound to an accessor by custatevecAccessorSetExtraWorkspace(), and the lifetime of the workspace must be as long as the accessor’s to cover the entire duration of the copy operation. If a device memory handler is set, it is not necessary to provide explicit workspace by users.

The begin and end arguments in the Get/Set APIs correspond to the state vector elements’ indices such that elements within the specified range are copied.

Use case¶

Extraction¶

// create accessor and check the size of external workspace
custatevecAccessorCreateView(
    handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen,
    maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// set external workspace
custatevecAccessorSetExtraWorkspace(
    handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes);

// get state vector elements
custatevecAccessorGet(
    handle, &accessor, buffer, accessBegin, accessEnd);

// deallocate the accessor
custatevecAccessorDestroy(accessor);

Update¶

// create accessor and check the size of external workspace
custatevecAccessorCreate(
    handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen,
    maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes);

// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
    cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

// set external workspace
custatevecAccessorSetExtraWorkspace(
    handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes);

// set state vector elements
custatevecAccessorSet(
    handle, &accessor, buffer, 0, nSvSize);

// deallocate the accessor
custatevecAccessorDestroy(accessor);

API reference¶

`custatevecAccessorCreate`¶

custatevecStatus_t custatevecAccessorCreate(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecAccessorDescriptor_t *accessor, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, size_t *extraWorkspaceSizeInBytes)¶

Create accessor to copy elements between the state vector and an external buffer.

Accessor copies state vector elements between the state vector and external buffers. During the copy, the ordering of state vector elements are rearranged according to the bit ordering specified by the bitOrdering argument.

The state vector is assumed to have the default ordering: the LSB is the 0th index bit and the (N-1)th index bit is the MSB for an N index bit system. The bit ordering of the external buffer is specified by the bitOrdering argument. When 3 is given to the nIndexBits argument and [1, 2, 0] to the bitOrdering argument, the state vector index bits are permuted to specified bit positions. Thus, the state vector index is rearranged and mapped to the external buffer index as [0, 4, 1, 5, 2, 6, 3, 7].

The maskBitString, maskOrdering and maskLen arguments specify the bit mask for the state vector index being accessed. If the maskLen argument is 0, the maskBitString and/or maskOrdering arguments can be null.

All bit positions [0, nIndexBits), should appear exactly once, either in the bitOrdering or the maskOrdering arguments. If a bit position does not appear in these arguments and/or there are overlaps of bit positions, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.

The extra workspace improves performance if the accessor is called multiple times with small external buffers placed on device. A null pointer can be specified to the extraWorkspaceSizeInBytes if the extra workspace is not necessary.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
accessor – [in] pointer to an accessor descriptor
bitOrdering – [in] pointer to a host array to specify the basis bits of the external buffer
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array to specify the mask values to limit access
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
extraWorkspaceSizeInBytes – [out] the required size of extra workspace

`custatevecAccessorCreateView`¶

custatevecStatus_t custatevecAccessorCreateView(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecAccessorDescriptor_t *accessor, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, size_t *extraWorkspaceSizeInBytes)¶

Create accessor for the constant state vector.

This function is the same as custatevecAccessorCreate(), but only accepts the constant state vector.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
accessor – [in] pointer to an accessor descriptor
bitOrdering – [in] pointer to a host array to specify the basis bits of the external buffer
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array to specify the mask values to limit access
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
extraWorkspaceSizeInBytes – [out] the required size of extra workspace

`custatevecAccessorDestroy`¶

custatevecStatus_t custatevecAccessorDestroy(custatevecAccessorDescriptor_t accessor)¶

This function releases resources used by the accessor.

Parameters: accessor – [in] the accessor descriptor

`custatevecAccessorSetExtraWorkspace`¶

custatevecStatus_t custatevecAccessorSetExtraWorkspace(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Set the external workspace to the accessor.

This function sets the extra workspace to the accessor. The required size for extra workspace can be obtained by custatevecAccessorCreate() or custatevecAccessorCreateView(). if a device memory handler is set, the extraWorkspace can be set to null, and the extraWorkspaceSizeInBytes can be set to 0.

Parameters

handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size

`custatevecAccessorGet`¶

custatevecStatus_t custatevecAccessorGet(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, void *externalBuffer, const custatevecIndex_t begin, const custatevecIndex_t end)¶

Copy state vector elements to an external buffer.

This function copies state vector elements to an external buffer specified by the externalBuffer argument. During the copy, the index bit is permuted as specified by the bitOrdering argument in custatevecAccessorCreate() or custatevecAccessorCreateView().

The begin and end arguments specify the range of state vector elements being copied. Both arguments have the bit ordering specified by the bitOrdering argument.

Parameters

handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
externalBuffer – [out] pointer to a host or device buffer to receive copied elements
begin – [in] index in the permuted bit ordering for the first elements being copied to the state vector
end – [in] index in the permuted bit ordering for the last elements being copied to the state vector (non-inclusive)

`custatevecAccessorSet`¶

custatevecStatus_t custatevecAccessorSet(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, const void *externalBuffer, const custatevecIndex_t begin, const custatevecIndex_t end)¶

Set state vector elements from an external buffer.

This function sets complex numbers to the state vector by using an external buffer specified by the externalBuffer argument. During the copy, the index bit is permuted as specified by the bitOrdering argument in custatevecAccessorCreate().

The begin and end arguments specify the range of state vector elements being set to the state vector. Both arguments have the bit ordering specified by the bitOrdering argument.

If a read-only accessor created by calling custatevecAccessorCreateView() is provided, this function returns CUSTATEVEC_STATUS_NOT_SUPPORTED.

Parameters

handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
externalBuffer – [in] pointer to a host or device buffer of complex values being copied to the state vector
begin – [in] index in the permuted bit ordering for the first elements being copied from the state vector
end – [in] index in the permuted bit ordering for the last elements being copied from the state vector (non-inclusive)

Single-process qubit reordering¶

For single-process computations, cuStateVec provides custatevecSwapIndexBits() API for single device and custatevecMultiDeviceSwapIndexBits() for multiple devices to reorder state vector elements.

Use case¶

single device¶

// This example uses 3 qubits.
const int nIndexBits = 3;

// swap 0th and 2nd qubits
const int nBitSwaps  = 1;
const int2 bitSwaps[] = {{0, 2}}; // specify the qubit pairs

// swap the state vector elements only if 1st qubit is 1
const int maskLen = 1;
int maskBitString[] = {1}; // specify the values of mask qubits
int maskOrdering[] = {1};  // specify the mask qubits

// Swap index bit pairs.
// {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to
// {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}.
custatevecSwapIndexBits(handle, sv, svDataType, nIndexBits, bitSwaps, nBitSwaps,
    maskBitString, maskOrdering, maskLen);

multiple devices¶

// This example uses 2 GPUs and each GPU stores 2-qubit sub state vector.
const int nGlobalIndexBits = 1;
const int nLocalIndexBits = 2;
const int nHandles = 1 << nGlobalIndexBits;

// Users are required to enable direct access on a peer device prior to the swap API call.
for (int i0 = 0; i0 < nHandles; i0++) {
  cudaSetDevice(i0);
  for (int i1 = 0; i1 < nHandles; i1++) {
    if (i0 == i1)
      continue;
    cudaDeviceEnablePeerAccess(i1, 0);
  }
}
cudaSetDevice(0);

// specify the type of device network topology to optimize the data transfer sequence.
// Here, devices are assumed to be connected via NVLink with an NVSwitch or
// PCIe device network with a single PCIe switch.
const custatevecDeviceNetworkType_t deviceNetworkType = CUSTATEVEC_DEVICE_NETWORK_TYPE_SWITCH;

// swap 0th and 2nd qubits
const int nIndexBitSwaps  = 1;
const int2 indexBitSwaps[] = {{0, 2}}; // specify the qubit pairs

// swap the state vector elements only if 1st qubit is 1
const int maskLen = 1;
int maskBitString[] = {1}; // specify the values of mask qubits
int maskOrdering[] = {1};  // specify the mask qubits

// Swap index bit pairs.
// {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to
// {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}.
custatevecMultiDeviceSwapIndexBits(handles, nHandles, subSVs, svDataType,
    nGlobalIndexBits, nLocalIndexBits, indexBitSwaps, nIndexBitSwaps,
    maskBitString, maskOrdering, maskLen, deviceNetworkType);

API reference¶

`custatevecSwapIndexBits`¶

custatevecStatus_t custatevecSwapIndexBits(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int2 *bitSwaps, const uint32_t nBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen)¶

Swap index bits and reorder state vector elements in one device.

This function updates the bit ordering of the state vector by swapping the pairs of bit positions.

The state vector is assumed to have the default ordering: the LSB is the 0th index bit and the (N-1)th index bit is the MSB for an N index bit system. The bitSwaps argument specifies the swapped bit index pairs, whose values must be in the range [0, nIndexBits).

The maskBitString, maskOrdering and maskLen arguments specify the bit mask for the state vector index being permuted. If the maskLen argument is 0, the maskBitString and/or maskOrdering arguments can be null.

A bit position can be included in both bitSwaps and maskOrdering. When a masked bit is swapped, state vector elements whose original indices match the mask bit string are written to the permuted indices while other elements are not copied.

Parameters

handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
bitSwaps – [in] pointer to a host array of swapping bit index pairs
nBitSwaps – [in] the number of bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask

`custatevecMultiDeviceSwapIndexBits`¶

custatevecStatus_t custatevecMultiDeviceSwapIndexBits(custatevecHandle_t *handles, const uint32_t nHandles, void **subSVs, const cudaDataType_t svDataType, const uint32_t nGlobalIndexBits, const uint32_t nLocalIndexBits, const int2 *indexBitSwaps, const uint32_t nIndexBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, const custatevecDeviceNetworkType_t deviceNetworkType)¶

Swap index bits and reorder state vector elements for multiple sub state vectors distributed to multiple devices.

This function updates the bit ordering of the state vector distributed in multiple devices by swapping the pairs of bit positions.

This function assumes the state vector is split into multiple sub state vectors and distributed to multiple devices to represent a (nGlobalIndexBits + nLocalIndexBits) qubit system.

The handles argument should receive cuStateVec handles created for all devices where sub state vectors are allocated. If two or more cuStateVec handles created for the same device are given, this function will return an error, CUSTATEVEC_STATUS_INVALID_VALUE. The handles argument should contain a handle created on the current device, as all operations in this function will be ordered on the stream of the current device’s handle. Otherwise, this function returns an error, CUSTATEVEC_STATUS_INVALID_VALUE.

Sub state vectors are specified by the subSVs argument as an array of device pointers. All sub state vectors are assumed to hold the same number of index bits specified by the nLocalIndexBits. Thus, each sub state vectors holds (1 << nLocalIndexBits) state vector elements. The global index bits is identical to the index of sub state vectors. The number of sub state vectors is given as (1 << nGlobalIndexBits). The max value of nGlobalIndexBits is 5, which corresponds to 32 sub state vectors.

The index bit of the distributed state vector has the default ordering: The index bits of the sub state vector are mapped from the 0th index bit to the (nLocalIndexBits-1)-th index bit. The global index bits are mapped from the (nLocalIndexBits)-th bit to the (nGlobalIndexBits + nLocalIndexBits - 1)-th bit.

The indexBitSwaps argument specifies the index bit pairs being swapped. Each index bit pair can be a pair of two global index bits or a pair of a global and a local index bit. A pair of two local index bits is not accepted. Please use custatevecSwapIndexBits() for swapping local index bits.

The maskBitString, maskOrdering and maskLen arguments specify the bit string mask that limits the state vector elements swapped during the call. Bits in maskOrdering can overlap index bits specified in the indexBitSwaps argument. In such cases, the mask bit string is applied for the bit positions before index bit swaps. If the maskLen argument is 0, the maskBitString and/or maskOrdering arguments can be null.

The deviceNetworkType argument specifies the device network topology to optimize the data transfer sequence. The following two network topologies are assumed:

Switch network: devices connected via NVLink with an NVSwitch (ex. DGX A100 and DGX-2) or PCIe device network with a single PCIe switch
Full mesh network: all devices are connected by full mesh connections (ex. DGX Station V100/A100)

Note

Important notice This function assumes bidirectional GPUDirect P2P is supported and enabled by cudaDeviceEnablePeerAccess() between all devices where sub state vectors are allocated. If GPUDirect P2P is not enabled, the call to custatevecMultiDeviceSwapIndexBits() that accesses otherwise inaccessible device memory allocated in other GPUs would result in a segmentation fault.

For the best performance, please use \(2^n\) number of devices and allocate one sub state vector in each device. This function allows the use of non- \(2^n\) number of devices, to allocate two or more sub state vectors on a device, or to allocate all sub state vectors on a single device to cover various hardware configurations. However, the performance is always the best when a single sub state vector is allocated on each \(2^n\) number of devices.

The copy on each participating device is enqueued on the CUDA stream bound to the corresponding handle via custatevecSetStream(). All CUDA calls before the call of this function are correctly ordered if these calls are issued on the streams set to handles. This function is asynchronously executed. Please use cudaStreamSynchronize() (for synchronization) or cudaStreamWaitEvent() (for establishing the stream order) with the stream set to the handle of the current device.

Parameters

handles – [in] pointer to a host array of custatevecHandle_t
nHandles – [in] the number of handles specified in the handles argument
subSVs – [inout] pointer to an array of sub state vectors
svDataType – [in] the data type of the state vector specified by the subSVs argument
nGlobalIndexBits – [in] the number of global index bits of distributed state vector
nLocalIndexBits – [in] the number of local index bits in sub state vector
indexBitSwaps – [in] pointer to a host array of index bit pairs being swapped
nIndexBitSwaps – [in] the number of index bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask
deviceNetworkType – [in] the device network topology

Multi-process qubit reordering¶

For multiple-process computations, cuStateVec provides APIs to schedule/reorder distributed state vector elements. In addition, cuStateVec has custatevecCommunicator_t, which wraps MPI libraries for inter-process communications. Please refer to Distributed Index Bit Swap API for the overview and the detailed usages.

API reference¶

`custatevecCommunicatorCreate`¶

custatevecStatus_t custatevecCommunicatorCreate(custatevecHandle_t handle, custatevecCommunicatorDescriptor_t *communicator, custatevecCommunicatorType_t communicatorType, const char *soname)¶

Create communicator.

This function creates a communicator instance.

The type of the communicator is specified by the communicatorType argument. By specifying CUSTATEVEC_COMMUNICATOR_TYPE_OPENMPI or CUSTATEVEC_COMMUNICATOR_TYPE_MPICH this function creates a communicator instance that internally uses Open MPI or MPICH, respectively. By specifying CUSTATEVEC_COMMUNICATOR_TYPE_EXTERNAL, this function loads a custom plugin that wraps an MPI library. The source code for the custom plugin is downloadable from NVIDIA/cuQuantum.

The soname argument specifies the name of the shared library that will be used by the communicator instance.

This function uses dlopen() to load the specified shared library. If Open MPI or MPICH library is directly linked to an application and CUSTATEVEC_COMMUNICATOR_TYPE_OPENMPI or CUSTATEVEC_COMMUNICATOR_TYPE_MPICH is specified to the communicatorType argument, the soname argument should be set to NULL. Thus, function symbols are resolved by searching the functions loaded to the application at startup time.

Parameters

handle – [in] the handle to cuStateVec library
communicator – [out] a pointer to the communicator
communicatorType – [in] the communicator type
soname – [in] the shared object name

`custatevecCommunicatorDestroy`¶

custatevecStatus_t custatevecCommunicatorDestroy(custatevecHandle_t handle, custatevecCommunicatorDescriptor_t communicator)¶

This function releases communicator.

Parameters

handle – [in] the handle to cuStateVec library
communicator – [in] the communicator descriptor

`custatevecDistIndexBitSwapSchedulerCreate`¶

custatevecStatus_t custatevecDistIndexBitSwapSchedulerCreate(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t *scheduler, const uint32_t nGlobalIndexBits, const uint32_t nLocalIndexBits)¶

Create distributed index bit swap scheduler.

This function creates a distributed index bit swap scheduler descriptor.

The local index bits are from the 0th index bit to the (nLocalIndexBits-1)-th index bit. The global index bits are mapped from the (nLocalIndexBits)-th bit to the (nGlobalIndexBits + nLocalIndexBits - 1)-th bit.

Parameters

handle – [in] the handle to cuStateVec library
scheduler – [out] a pointer to a batch swap scheduler
nGlobalIndexBits – [in] the number of global index bits
nLocalIndexBits – [in] the number of local index bits

`custatevecDistIndexBitSwapSchedulerDestroy`¶

custatevecStatus_t custatevecDistIndexBitSwapSchedulerDestroy(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler)¶

This function releases distributed index bit swap scheduler.

Parameters

handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to the batch swap scheduler to destroy

`custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps`¶

custatevecStatus_t custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler, const int2 *indexBitSwaps, const uint32_t nIndexBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, uint32_t *nSwapBatches)¶

Set index bit swaps to distributed index bit swap scheduler.

This function sets index bit swaps to the distributed index bit swap scheduler and computes the number of necessary batched data transfers for the given index bit swaps.

The index bit of the distributed state vector has the default ordering: The index bits of the sub state vector are mapped from the 0th index bit to the (nLocalIndexBits-1)-th index bit. The global index bits are mapped from the (nLocalIndexBits)-th bit to the (nGlobalIndexBits + nLocalIndexBits - 1)-th bit.

The indexBitSwaps argument specifies the index bit pairs being swapped. Each index bit pair can be a pair of two global index bits or a pair of a global and a local index bit. A pair of two local index bits is not accepted. Please use custatevecSwapIndexBits() for swapping local index bits.

The maskBitString, maskOrdering and maskLen arguments specify the bit string mask that limits the state vector elements swapped during the call. Bits in maskOrdering can overlap index bits specified in the indexBitSwaps argument. In such cases, the mask bit string is applied for the bit positions before index bit swaps. If the maskLen argument is 0, the maskBitString and/or maskOrdering arguments can be null.

The returned value by the nSwapBatches argument represents the number of loops required to complete index bit swaps and is used in later stages.

Parameters

handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to batch swap scheduler descriptor
indexBitSwaps – [in] pointer to a host array of index bit pairs being swapped
nIndexBitSwaps – [in] the number of index bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask
nSwapBatches – [out] the number of batched data transfers

`custatevecDistIndexBitSwapSchedulerGetParameters`¶

custatevecStatus_t custatevecDistIndexBitSwapSchedulerGetParameters(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler, const int32_t swapBatchIndex, const int32_t orgSubSVIndex, custatevecSVSwapParameters_t *parameters)¶

Get parameters to be set to the state vector swap worker.

This function computes parameters used for data transfers between sub state vectors. The value of the swapBatchIndex argument should be in range of [0, nSwapBatches) where nSwapBatches is the number of loops obtained by the call to custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps().

The parameters argument returns the computed parameters for data transfer, which is set to custatevecSVSwapWorker by the call to custatevecSVSwapWorkerSetParameters().

Parameters

handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to batch swap scheduler descriptor
swapBatchIndex – [in] swap batch index for state vector swap parameters
orgSubSVIndex – [in] the index of the origin sub state vector to swap state vector segments
parameters – [out] a pointer to data transfer parameters

`custatevecSVSwapWorkerCreate`¶

custatevecStatus_t custatevecSVSwapWorkerCreate(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t *svSwapWorker, custatevecCommunicatorDescriptor_t communicator, void *orgSubSV, int32_t orgSubSVIndex, cudaEvent_t orgEvent, cudaDataType_t svDataType, cudaStream_t stream, size_t *extraWorkspaceSizeInBytes, size_t *minTransferWorkspaceSizeInBytes)¶

Create state vector swap worker.

This function creates a custatevecSVSwapWorkerDescriptor_t that swaps/sends/receives state vector elements between multiple sub state vectors. The communicator specified as the communicator argument is used for inter-process communication, thus state vector elements are transferred between sub state vectors distributed to multiple processes and nodes.

The created descriptor works on the device where the handle is created. The origin sub state vector specified by the orgSubSV argument should be allocated on the same device. The same applies to the event and the stream specified by the orgEvent and stream arguments respectively.

There are two workspaces, extra workspace and data transfer workspace. The extra workspace has constant size and is used to keep the internal state of the descriptor. The data transfer workspace is used to stage state vector elements being transferred. Its minimum size is given by the minTransferWorkspaceSizeInBytes argument. Depending on the system, increasing the size of data transfer workspace can improve performance.

If all the destination sub state vectors are specified by using custatevecSVSwapWorkerSetSubSVsP2P(), the communicator argument can be null. In this case, the internal CUDA calls are not serialized on the stream specified by the stream argument. It’s the user’s responsibility to call cudaStreamSynchronize() and global barrier such as MPI_Barrier() in this order to complete all internal CUDA calls. This limitation will be fixed in a future version.

If sub state vectors are distributed to multiple processes, the event should be created with the cudaEventInterprocess flag. Please refer to the CUDA Toolkit documentation for the details.

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [out] state vector swap worker
communicator – [in] a pointer to the MPI communicator
orgSubSV – [in] a pointer to a sub state vector
orgSubSVIndex – [in] the index of the sub state vector specified by the orgSubSV argument
orgEvent – [in] the event for synchronization with the peer worker
svDataType – [in] data type used by the state vector representation
stream – [in] a stream that is used to locally execute kernels during data transfers
extraWorkspaceSizeInBytes – [out] the size of the extra workspace needed
minTransferWorkspaceSizeInBytes – [out] the minimum-required size of the transfer workspace

`custatevecSVSwapWorkerDestroy`¶

custatevecStatus_t custatevecSVSwapWorkerDestroy(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker)¶

This function releases the state vector swap worker.

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker

`custatevecSVSwapWorkerSetExtraWorkspace`¶

custatevecStatus_t custatevecSVSwapWorkerSetExtraWorkspace(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶

Set extra workspace.

This function sets the extra workspace to the state vector swap worker. The required size for extra workspace can be obtained by custatevecSVSwapWorkerCreate().

The extra workspace should be set before calling custatevecSVSwapWorkerSetParameters().

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
extraWorkspace – [in] pointer to the user-owned workspace
extraWorkspaceSizeInBytes – [in] size of the user-provided workspace

`custatevecSVSwapWorkerSetTransferWorkspace`¶

custatevecStatus_t custatevecSVSwapWorkerSetTransferWorkspace(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void *transferWorkspace, size_t transferWorkspaceSizeInBytes)¶

Set transfer workspace.

This function sets the transfer workspace to the state vector swap worker instance. The minimum size for transfer workspace can be obtained by custatevecSVSwapWorkerCreate().

Depending on the system hardware configuration, larger size of the transfer workspace can improve the performance. The size specified by the transferWorkspaceSizeInBytes should a power of two number and should be equal to or larger than the value of the minTransferWorkspaceSizeInBytes returned by the call to custatevecSVSwapWorkerCreate().

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
transferWorkspace – [in] pointer to the user-owned workspace
transferWorkspaceSizeInBytes – [in] size of the user-provided workspace

`custatevecSVSwapWorkerSetSubSVsP2P`¶

custatevecStatus_t custatevecSVSwapWorkerSetSubSVsP2P(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void **dstSubSVsP2P, const int32_t *dstSubSVIndicesP2P, cudaEvent_t *dstEvents, const uint32_t nDstSubSVsP2P)¶

Set sub state vector pointers accessible via GPUDirect P2P.

This function sets sub state vector pointers that are accessible by GPUDirect P2P from the device where the state vector swap worker works. The sub state vector pointers should be specified together with the sub state vector indices and events which are passed to custatevecSVSwapWorkerCreate() to create peer SV swap worker instances.

If sub state vectors are allocated in different processes, the sub state vector pointers and the events should be retrieved by using CUDA IPC.

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
dstSubSVsP2P – [in] an array of pointers to sub state vectors that are accessed by GPUDirect P2P
dstSubSVIndicesP2P – [in] the sub state vector indices of sub state vector pointers specified by the dstSubSVsP2P argument
dstEvents – [in] events used to create peer workers
nDstSubSVsP2P – [in] the number of sub state vector pointers specified by the dstSubSVsP2P argument

`custatevecSVSwapWorkerSetParameters`¶

custatevecStatus_t custatevecSVSwapWorkerSetParameters(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, const custatevecSVSwapParameters_t *parameters, int peer)¶

Set state vector swap parameters.

This function sets the parameters to swap state vector elements. The value of the parameters argument is retrieved by calling custatevecDistIndexBitSwapSchedulerGetParameters().

The peer argument specifies the rank of the peer process that holds the destination sub state vector. The sub state vector index of the destination sub state vector is obtained from the dstSubSVIndex member defined in custatevecSVSwapParameters_t.

If all the sub state vectors are accessible by GPUDirect P2P and a null pointer is passed to the communicator argument when calling custatevecSVSwapWorkerCreate(), the peer argument is ignored.

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
parameters – [in] data transfer parameters
peer – [in] the peer process identifier of the data transfer

`custatevecSVSwapWorkerExecute`¶

custatevecStatus_t custatevecSVSwapWorkerExecute(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, custatevecIndex_t begin, custatevecIndex_t end)¶

Execute the data transfer.

This function executes the transfer of state vector elements. The number of elements being transferred is obtained from the transferSize member in custatevecSVSwapParameters_t. The begin and end arguments specify the range, [begin, end), for elements being transferred.

Parameters

handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
begin – [in] the index to start transfer
end – [in] the index to end transfer

Sub state vector migration¶

For distributed state vector simulations on host/device memories, cuStateVec provides APIs to migrate distributed state vector elements. Please refer to the Sub State Vector Migration API for the overview and the detailed usages.

API reference¶

`custatevecSubSVMigratorCreate`¶

custatevecStatus_t custatevecSubSVMigratorCreate(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t *migrator, void *deviceSlots, cudaDataType_t svDataType, int nDeviceSlots, int nLocalIndexBits)¶

Create sub state vector migrator descriptor.

This function creates a sub state vector migrator descriptor. Sub state vectors specified by the deviceSlots argument are allocated in one contiguous memory array and its size should be at least ( \(\text{nDeviceSlots} \times 2^\text{nLocalIndexBits}\)).

Parameters

handle – [in] the handle to the cuStateVec library
migrator – [out] pointer to a new migrator descriptor
deviceSlots – [in] pointer to sub state vectors on device
svDataType – [in] data type of state vector
nDeviceSlots – [in] the number of sub state vectors in deviceSlots
nLocalIndexBits – [in] the number of index bits of sub state vectors

`custatevecSubSVMigratorDestroy`¶

custatevecStatus_t custatevecSubSVMigratorDestroy(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t migrator)¶

Destroy sub state vector migrator descriptor.

This function releases a sub state vector migrator descriptor.

Parameters

handle – [in] the handle to the cuStateVec library
migrator – [inout] the migrator descriptor

`custatevecSubSVMigratorMigrate`¶

custatevecStatus_t custatevecSubSVMigratorMigrate(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t migrator, int deviceSlotIndex, const void *srcSubSV, void *dstSubSV, custatevecIndex_t begin, custatevecIndex_t end)¶

Sub state vector migration.

This function performs a sub state vector migration. The deviceSlotIndex argument specifies the index of the sub state vector to be transferred, and the srcSubSV and dstSubSV arguments specify sub state vectors to be transferred from/to the sub state vector on device. In the current version, srcSubSV and dstSubSV must be arrays allocated on host memory and accessible from the device. If either srcSubSV or dstSubSV is a null pointer, the corresponding data transfer will be skipped. The begin and end arguments specify the range, [begin, end), for elements being transferred.

Parameters

handle – [in] the handle to the cuStateVec library
migrator – [in] the migrator descriptor
deviceSlotIndex – [in] the index to specify sub state vector to migrate
srcSubSV – [in] a pointer to a sub state vector that is migrated to deviceSlots
dstSubSV – [out] a pointer to a sub state vector that is migrated from deviceSlots
begin – [in] the index to start migration
end – [in] the index to end migration

cuStateVec Functions¶

Library Management¶

Handle Management API¶

custatevecCreate¶

custatevecDestroy¶

custatevecGetDefaultWorkspaceSize¶

custatevecSetWorkspace¶

CUDA Stream Management API¶

custatevecSetStream¶

custatevecGetStream¶

Error Management API¶

custatevecGetErrorName¶

custatevecGetErrorString¶

Logger API¶

custatevecLoggerSetCallback¶

custatevecLoggerSetCallbackData¶

custatevecLoggerSetFile¶

custatevecLoggerOpenFile¶

custatevecLoggerSetLevel¶

custatevecLoggerSetMask¶

custatevecLoggerForceDisable¶

Versioning API¶

custatevecGetProperty¶

custatevecGetVersion¶

Memory Management API¶

custatevecSetDeviceMemHandler¶

custatevecGetDeviceMemHandler¶

Initialization¶

Use case¶

API reference¶

custatevecInitializeStateVector¶

Gate Application¶

General Matrices¶

Use case¶

API reference¶

custatevecApplyMatrixGetWorkspaceSize¶

custatevecApplyMatrix¶

custatevecApplyMatrixBatchedGetWorkspaceSize¶

custatevecApplyMatrixBatched¶

Pauli Matrices¶

Use case¶

API reference¶

custatevecApplyPauliRotation¶

Generalized Permutation Matrices¶

Use case¶

API reference¶

custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize¶

custatevecApplyGeneralizedPermutationMatrix¶

Measurement¶

Measurement on Z-bases¶

Use case¶

API reference¶

custatevecAbs2SumOnZBasis¶

custatevecCollapseOnZBasis¶

custatevecMeasureOnZBasis¶

Qubit Measurement¶

Use case¶

API reference¶

custatevecAbs2SumArray¶

custatevecCollapseByBitString¶

custatevecBatchMeasure¶

custatevecAbs2SumArrayBatched¶

custatevecCollapseByBitStringBatchedGetWorkspaceSize¶

custatevecCollapseByBitStringBatched¶

custatevecMeasureBatched¶

custatevecBatchMeasureWithOffset¶

Expectation¶

Expectation via a Matrix¶

Use case¶

API reference¶

custatevecComputeExpectationGetWorkspaceSize¶

custatevecComputeExpectation¶

custatevecComputeExpectationBatchedGetWorkspaceSize¶

custatevecComputeExpectationBatched¶

Expectation on Pauli Basis¶

Use case¶

API reference¶

custatevecComputeExpectationsOnPauliBasis¶

Matrix property testing¶

Use case¶

`custatevecCreate`¶

`custatevecDestroy`¶

`custatevecGetDefaultWorkspaceSize`¶

`custatevecSetWorkspace`¶

`custatevecSetStream`¶

`custatevecGetStream`¶

`custatevecGetErrorName`¶

`custatevecGetErrorString`¶

`custatevecLoggerSetCallback`¶

`custatevecLoggerSetCallbackData`¶

`custatevecLoggerSetFile`¶

`custatevecLoggerOpenFile`¶

`custatevecLoggerSetLevel`¶

`custatevecLoggerSetMask`¶

`custatevecLoggerForceDisable`¶

`custatevecGetProperty`¶

`custatevecGetVersion`¶

`custatevecSetDeviceMemHandler`¶

`custatevecGetDeviceMemHandler`¶

`custatevecInitializeStateVector`¶

`custatevecApplyMatrixGetWorkspaceSize`¶

`custatevecApplyMatrix`¶

`custatevecApplyMatrixBatchedGetWorkspaceSize`¶

`custatevecApplyMatrixBatched`¶

`custatevecApplyPauliRotation`¶

`custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize`¶

`custatevecApplyGeneralizedPermutationMatrix`¶

`custatevecAbs2SumOnZBasis`¶

`custatevecCollapseOnZBasis`¶

`custatevecMeasureOnZBasis`¶

`custatevecAbs2SumArray`¶

`custatevecCollapseByBitString`¶

`custatevecBatchMeasure`¶

`custatevecAbs2SumArrayBatched`¶

`custatevecCollapseByBitStringBatchedGetWorkspaceSize`¶

`custatevecCollapseByBitStringBatched`¶

`custatevecMeasureBatched`¶

`custatevecBatchMeasureWithOffset`¶

`custatevecComputeExpectationGetWorkspaceSize`¶

`custatevecComputeExpectation`¶

`custatevecComputeExpectationBatchedGetWorkspaceSize`¶

`custatevecComputeExpectationBatched`¶

`custatevecComputeExpectationsOnPauliBasis`¶

`custatevecTestMatrixTypeGetWorkspaceSize`¶

`custatevecTestMatrixType`¶

`custatevecSamplerCreate`¶

`custatevecSamplerPreprocess`¶

`custatevecSamplerGetSquaredNorm`¶

`custatevecSamplerApplySubSVOffset`¶

`custatevecSamplerSample`¶

`custatevecSamplerDestroy`¶

`custatevecAccessorCreate`¶

`custatevecAccessorCreateView`¶

`custatevecAccessorDestroy`¶

`custatevecAccessorSetExtraWorkspace`¶

`custatevecAccessorGet`¶

`custatevecAccessorSet`¶

`custatevecSwapIndexBits`¶

`custatevecMultiDeviceSwapIndexBits`¶

`custatevecCommunicatorCreate`¶

`custatevecCommunicatorDestroy`¶

`custatevecDistIndexBitSwapSchedulerCreate`¶

`custatevecDistIndexBitSwapSchedulerDestroy`¶

`custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps`¶

`custatevecDistIndexBitSwapSchedulerGetParameters`¶

`custatevecSVSwapWorkerCreate`¶

`custatevecSVSwapWorkerDestroy`¶

`custatevecSVSwapWorkerSetExtraWorkspace`¶

`custatevecSVSwapWorkerSetTransferWorkspace`¶

`custatevecSVSwapWorkerSetSubSVsP2P`¶

`custatevecSVSwapWorkerSetParameters`¶

`custatevecSVSwapWorkerExecute`¶

`custatevecSubSVMigratorCreate`¶

`custatevecSubSVMigratorDestroy`¶

`custatevecSubSVMigratorMigrate`¶