cuStateVec Functions¶
Library Management¶
Handle Management API¶
custatevecCreate
¶
-
custatevecStatus_t custatevecCreate(custatevecHandle_t *handle)¶
This function initializes the cuStateVec library and creates a handle on the cuStateVec context. It must be called prior to any other cuStateVec API functions.
- Parameters
handle – [in] the pointer to the handle to the cuStateVec context
custatevecDestroy
¶
-
custatevecStatus_t custatevecDestroy(custatevecHandle_t handle)¶
This function releases resources used by the cuStateVec library.
- Parameters
handle – [in] the handle to the cuStateVec context
custatevecGetDefaultWorkspaceSize
¶
-
custatevecStatus_t custatevecGetDefaultWorkspaceSize(custatevecHandle_t handle, size_t *workspaceSizeInBytes)¶
This function returns the default workspace size defined by the cuStateVec library.
This function returns the default size used for the workspace.
- Parameters
handle – [in] the handle to the cuStateVec context
workspaceSizeInBytes – [out] default workspace size
custatevecSetWorkspace
¶
-
custatevecStatus_t custatevecSetWorkspace(custatevecHandle_t handle, void *workspace, size_t workspaceSizeInBytes)¶
This function sets the workspace used by the cuStateVec library.
This function sets the workspace attached to the handle. The required size of the workspace is obtained by custatevecGetDefaultWorkspaceSize().
By setting a larger workspace, users are able to execute functions without allocating the extra workspace in some functions.
If a device memory handler is set, the
workspace
can be set to null and the workspace is allocated using the user-defined memory pool.- Parameters
handle – [in] the handle to the cuStateVec context
workspace – [in] device pointer to workspace
workspaceSizeInBytes – [in] workspace size
CUDA Stream Management API¶
custatevecSetStream
¶
-
custatevecStatus_t custatevecSetStream(custatevecHandle_t handle, cudaStream_t streamId)¶
This function sets the stream to be used by the cuStateVec library to execute its routine.
- Parameters
handle – [in] the handle to the cuStateVec context
streamId – [in] the stream to be used by the library
custatevecGetStream
¶
-
custatevecStatus_t custatevecGetStream(custatevecHandle_t handle, cudaStream_t *streamId)¶
This function gets the cuStateVec library stream used to execute all calls from the cuStateVec library functions.
- Parameters
handle – [in] the handle to the cuStateVec context
streamId – [out] the stream to be used by the library
Error Management API¶
custatevecGetErrorName
¶
-
const char *custatevecGetErrorName(custatevecStatus_t status)¶
This function returns the name string for the input error code. If the error code is not recognized, “unrecognized error code” is returned.
- Parameters
status – [in] Error code to convert to string
custatevecGetErrorString
¶
-
const char *custatevecGetErrorString(custatevecStatus_t status)¶
This function returns the description string for an error code. If the error code is not recognized, “unrecognized error code” is returned.
- Parameters
status – [in] Error code to convert to string
Logger API¶
custatevecLoggerSetCallback
¶
-
custatevecStatus_t custatevecLoggerSetCallback(custatevecLoggerCallback_t callback)¶
Experimental: This function sets the logging callback function.
- Parameters
callback – [in] Pointer to a callback function. See custatevecLoggerCallback_t.
custatevecLoggerSetCallbackData
¶
-
custatevecStatus_t custatevecLoggerSetCallbackData(custatevecLoggerCallbackData_t callback, void *userData)¶
Experimental: This function sets the logging callback function with user data.
- Parameters
callback – [in] Pointer to a callback function. See custatevecLoggerCallbackData_t.
userData – [in] Pointer to user-provided data.
custatevecLoggerSetFile
¶
-
custatevecStatus_t custatevecLoggerSetFile(FILE *file)¶
Experimental: This function sets the logging output file.
Note
Once registered using this function call, the provided file handle must not be closed unless the function is called again to switch to a different file handle.
- Parameters
file – [in] Pointer to an open file. File should have write permission.
custatevecLoggerOpenFile
¶
-
custatevecStatus_t custatevecLoggerOpenFile(const char *logFile)¶
Experimental: This function opens a logging output file in the given path.
- Parameters
logFile – [in] Path of the logging output file.
custatevecLoggerSetLevel
¶
-
custatevecStatus_t custatevecLoggerSetLevel(int32_t level)¶
Experimental: This function sets the value of the logging level.
Levels are defined as follows:
Level
Summary
Long Description
“0”
Off
logging is disabled (default)
“1”
Errors
only errors will be logged
“2”
Performance Trace
API calls that launch CUDA kernels will log their parameters and important information
“3”
Performance Hints
hints that can potentially improve the application’s performance
“4”
Heuristics Trace
provides general information about the library execution, may contain details about heuristic status
“5”
API Trace
API Trace - API calls will log their parameter and important information
- Parameters
level – [in] Value of the logging level.
custatevecLoggerSetMask
¶
-
custatevecStatus_t custatevecLoggerSetMask(int32_t mask)¶
Experimental: This function sets the value of the logging mask. Masks are defined as a combination of the following masks:
Level
Description
“0”
Off
“1”
Errors
“2”
Performance Trace
“4”
Performance Hints
“8”
Heuristics Trace
“16”
API Trace
- Parameters
mask – [in] Value of the logging mask.
custatevecLoggerForceDisable
¶
-
custatevecStatus_t custatevecLoggerForceDisable()¶
Experimental: This function disables logging for the entire run.
Versioning API¶
custatevecGetProperty
¶
-
custatevecStatus_t custatevecGetProperty(libraryPropertyType type, int32_t *value)¶
This function returns the version information of the cuStateVec library.
- Parameters
type – [in] requested property (
MAJOR_VERSION
,MINOR_VERSION
, orPATCH_LEVEL
).value – [out] value of the requested property.
Memory Management API¶
A stream-ordered memory allocator (or mempool for short) allocates/deallocates memory asynchronously from/to a mempool in a
stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra-
stream dependency. There are several well-implemented stream-ordered mempools available, such as cudaMemPool_t
that is built-in
at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool,
see here) and the RAPIDS
Memory Manager (RMM). For a detailed introduction, see the NVIDIA Developer Blog.
The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuStateVec can take care of most of the memory management for users. Below is an illustration of what can be done:
MyMemPool pool = MyMemPool(); // kept alive for the entire process in real apps
int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
return reinterpret_cast<MyMemPool*>(ctx)->alloc(ptr, size, stream);
}
int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
return reinterpret_cast<MyMemPool*>(ctx)->dealloc(ptr, size, stream);
}
// create a mem handler and fill in the required members for the library to use
custatevecDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&pool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;
memcpy(handler.name, std::string("my pool").c_str(), CUSTATEVEC_ALLOCATOR_NAME_LEN);
// bind the handler to the library handle
custatevecSetDeviceMemHandler(handle, &handler);
/* ... use gate application as usual ... */
// User doesn’t compute the required sizes
// User doesn’t query the workspace size (but one can if desired)
// User doesn’t allocate memory!
// User sets null pointer to indicate the library should draw memory from the user's pool;
void* extraWorkspace = nullptr;
size_t extraWorkspaceInBytes = 0;
custatevecApplyMatrix(
handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout,
adjoint, targets, nTargets, controls, nControls, controlBitValues,
computeType, extraWorkspace, extraWorkspaceSizeInBytes);
// User doesn’t deallocate memory!
As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations.
In the current release, only a device mempool can be bound.
custatevecSetDeviceMemHandler
¶
-
custatevecStatus_t custatevecSetDeviceMemHandler(custatevecHandle_t handle, const custatevecDeviceMemHandler_t *handler)¶
Set the current device memory handler.
Once set, when cuStateVec needs device memory in various API calls it will allocate from the user-provided memory pool and deallocate at completion. See custatevecDeviceMemHandler_t and APIs that require extra workspace for further detail.
The internal stream order is established using the user-provided stream set via custatevecSetStream().
If
handler
argument is set to nullptr, the library handle will detach its existing memory handler.Warning
It is undefined behavior for the following scenarios:
the library handle is bound to a memory handler and subsequently to another handler
the library handle outlives the attached memory pool
the memory pool is not stream-ordered
- Parameters
handle – [in] Opaque handle holding cuStateVec’s library context.
handler – [in] the device memory handler that encapsulates the user’s mempool. The struct content is copied internally.
custatevecGetDeviceMemHandler
¶
-
custatevecStatus_t custatevecGetDeviceMemHandler(custatevecHandle_t handle, custatevecDeviceMemHandler_t *handler)¶
Get the current device memory handler.
- Parameters
handle – [in] Opaque handle holding cuStateVec’s library context.
handler – [out] If previously set, the struct pointed to by
handler
is filled in, otherwise CUSTATEVEC_STATUS_NO_DEVICE_ALLOCATOR is returned.
Initialization¶
cuStateVec API custatevecInitializeStateVector()
can be used to initialize a state vector to any of a set of prescribed states.
Please refer to custatevecStateVectorType_t for details.
Use case¶
// initialize state vector
custatevecInitializeStateVector(handle, sv, svDataType, nIndexBits, svType);
API reference¶
custatevecInitializeStateVector
¶
-
custatevecStatus_t custatevecInitializeStateVector(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecStateVectorType_t svType)¶
Initialize the state vector to a certain form.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
svType – [in] the target quantum state
Gate Application¶
General Matrices¶
cuStateVec API custatevecApplyMatrix()
can apply a matrix representing a gate to a state vector.
The API may require external workspace for large matrices,
and custatevecApplyMatrixGetWorkspaceSize()
provides the size of workspace.
If a device memory handler is set, custatevecApplyMatrixGetWorkspaceSize()
can be skipped.
custatevecApplyMatrixBatchedGetWorkspaceSize()
and custatevecApplyMatrixBatched()
can apply matrices to batched state vectors.
Please refer to batched state vectors for the overview of batched state vector simulations.
Use case¶
// check the size of external workspace
custatevecApplyMatrixGetWorkspaceSize(
handle, svDataType, nIndexBits, matrix, matrixDataType, layout, adjoint, nTargets,
nControls, computeType, &extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// apply gate
custatevecApplyMatrix(
handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout,
adjoint, targets, nTargets, controls, controlBitValues, nControls,
computeType, extraWorkspace, extraWorkspaceSizeInBytes);
API reference¶
custatevecApplyMatrixGetWorkspaceSize
¶
-
custatevecStatus_t custatevecApplyMatrixGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nTargets, const uint32_t nControls, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶
This function gets the required workspace size for custatevecApplyMatrix().
This function returns the required extra workspace size to execute custatevecApplyMatrix().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required for a given set of arguments.- Parameters
handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] workspace size
custatevecApplyMatrix
¶
-
custatevecStatus_t custatevecApplyMatrix(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Apply gate matrix.
Apply gate matrix to a state vector. The state vector size is \(2^\text{nIndexBits}\).
The matrix argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrix is ( \(2^\text{nTargets} \times 2^\text{nTargets}\) ) and the value type is specified by the
matrixDataType
argument. Thelayout
argument specifies the matrix layout which can be in either row-major or column-major order. Thetargets
andcontrols
arguments specify target and control bit positions in the state vector index.The
controlBitValues
argument specifies bit values of control bits. The ordering ofcontrolBitValues
is specified by thecontrols
argument. If a null pointer is specified to this argument, all control bit values are set to 1.By definition, bit positions in
targets
andcontrols
arguments should not overlap.This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nTargets
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyMatrixGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a square matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size
custatevecApplyMatrixBatchedGetWorkspaceSize
¶
-
custatevecStatus_t custatevecApplyMatrixBatchedGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, custatevecMatrixMapType_t mapType, const int32_t *matrixIndices, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nMatrices, const uint32_t nTargets, const uint32_t nControls, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶
This function gets the required workspace size for custatevecApplyMatrixBatched().
This function returns the required extra workspace size to execute custatevecApplyMatrixBatched().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required for a given set of arguments.- Parameters
handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
mapType – [in] enumerator specifying the way to assign matrices
matrixIndices – [in] pointer to a host or device array of matrix indices
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nMatrices – [in] the number of matrices
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] workspace size
custatevecApplyMatrixBatched
¶
-
custatevecStatus_t custatevecApplyMatrixBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, custatevecIndex_t svStride, custatevecMatrixMapType_t mapType, const int32_t *matrixIndices, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t adjoint, const uint32_t nMatrices, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
This function applies one gate matrix to each one of a set of batched state vectors.
This function applies one gate matrix for each of batched state vectors given by the
batchedSv
argument. Batched state vectors are allocated in single device memory chunk with the stride specified by thesvStride
argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by thenSVs
argument.The
mapType
argument specifies the way to assign matrices to the state vectors, and thematrixIndices
argument specifies the matrix indices for the state vectors. WhenmapType
is CUSTATEVEC_MATRIX_MAP_TYPE_MATRIX_INDEXED, the \(\text{matrixIndices[}i\text{]}\)-th matrix will be assigned to the \(i\)-th state vector.matrixIndices
should containnSVs
integers whenmapType
is CUSTATEVEC_MATRIX_MAP_TYPE_MATRIX_INDEXED and it can be a null pointer whenmapType
is CUSTATEVEC_MATRIX_MAP_TYPE_BROADCAST.The
matrices
argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrices is ( \(\text{nMatrices} \times 2^\text{nTargets} \times 2^\text{nTargets}\) ) and the value type is specified by thematrixDataType
argument. Thelayout
argument specifies the matrix layout which can be in either row-major or column-major order. Thetargets
andcontrols
arguments specify target and control bit positions in the state vector index. In this API, these bit positions are uniform for all the batched state vectors.The
controlBitValues
argument specifies bit values of control bits. The ordering ofcontrolBitValues
is specified by thecontrols
argument. If a null pointer is specified to this argument, all control bit values are set to 1.By definition, bit positions in
targets
andcontrols
arguments should not overlap.This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nTargets
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyMatrixBatchedGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.Note
In this version, this API does not return any errors even if the
matrixIndices
argument contains invalid matrix indices. However, when applicable an error message would be printed to stdout.- Parameters
handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vectors
nIndexBits – [in] the number of index bits of the state vectors
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
mapType – [in] enumerator specifying the way to assign matrices
matrixIndices – [in] pointer to a host or device array of matrix indices
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] enumerator specifying the memory layout of matrix
adjoint – [in] apply adjoint of matrix
nMatrices – [in] the number of matrices
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size
Pauli Matrices¶
Exponential of a tensor product of Pauli matrices can be expressed as follows:
Matrix \(P_{target[i]}\) can be either of Pauli matrices \(I\), \(X\), \(Y\), and \(Z\), which are corresponding to the custatevecPauli_t
enums CUSTATEVEC_PAULI_I
, CUSTATEVEC_PAULI_X
, CUSTATEVEC_PAULI_Y
, and CUSTATEVEC_PAULI_Z
, respectively.
Also refer to custatevecPauli_t for details.
Use case¶
// apply exponential
custatevecApplyPauliRotation(
handle, sv, svDataType, nIndexBits, theta, paulis, targets, nTargets,
controls, controlBitValues, nControls);
API reference¶
custatevecApplyPauliRotation
¶
-
custatevecStatus_t custatevecApplyPauliRotation(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double theta, const custatevecPauli_t *paulis, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls)¶
Apply the exponential of a multi-qubit Pauli operator.
Apply exponential of a tensor product of Pauli bases specified by bases, \( e^{i \theta P} \), where \(P\) is the product of Pauli bases. The
paulis
,targets
, andnTargets
arguments specify Pauli bases and their bit positions in the state vector index.At least one target and a corresponding Pauli basis should be specified.
The
controls
andnControls
arguments specifies the control bit positions in the state vector index.The
controlBitValues
argument specifies bit values of control bits. The ordering ofcontrolBitValues
is specified by thecontrols
argument. If a null pointer is specified to this argument, all control bit values are set to 1.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of bits in the state vector index
theta – [in] theta
paulis – [in] host pointer to custatevecPauli_t array
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
Generalized Permutation Matrices¶
A generalized permutation matrix can be expressed as the multiplication of a permutation matrix \(P\) and a diagonal matrix \(D\). For instance, we can decompose a 4 \(\times\) 4 generalized permutation matrix \(A\) as follows:
, where
When \(P\) is diagonal, the generalized permutation matrix is also diagonal. Similarly, when \(D\) is the identity matrix, the generalized permutation matrix becomes a permutation matrix.
The cuStateVec API custatevecApplyGeneralizedPermutationMatrix()
applies a generalized permutation matrix like \(A\) to a state vector.
The API may require extra workspace for large matrices,
whose size can be queried using custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()
.
If a device memory handler is set, custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()
can be skipped.
Use case¶
// check the size of external workspace
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize(
handle, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F, targets,
nTargets, nControls, &extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// apply a generalized permutation matrix
custatevecApplyGeneralizedPermutationMatrix(
handle, d_sv, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F,
adjoint, targets, nTargets, controls, controlBitValues, nControls,
extraWorkspace, extraWorkspaceSizeInBytes);
The operation is equivalent to the following:
// sv, sv_temp: the state vector and temporary buffer.
int64_t sv_size = int64_t{1} << nIndexBits;
for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++) {
// The basis of sv_idx is converted to permutation basis to obtain perm_idx
auto perm_idx = convertToPermutationBasis(sv_idx);
// apply generalized permutation matrix
if (adjoint == 0)
sv_temp[sv_idx] = sv[permutation[perm_idx]] * diagonals[perm_idx];
else
sv_temp[permutation[perm_idx]] = sv[sv_idx] * conj(diagonals[perm_idx]);
}
for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++)
sv[sv_idx] = sv_temp[sv_idx];
API reference¶
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize
¶
-
custatevecStatus_t custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const custatevecIndex_t *permutation, const void *diagonals, cudaDataType_t diagonalsDataType, const int32_t *targets, const uint32_t nTargets, const uint32_t nControls, size_t *extraWorkspaceSizeInBytes)¶
Get the extra workspace size required by custatevecApplyGeneralizedPermutationMatrix().
This function gets the size of extra workspace size required to execute custatevecApplyGeneralizedPermutationMatrix().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required for a given set of arguments.- Parameters
handle – [in] the handle to the cuStateVec library
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
permutation – [in] host or device pointer to a permutation table
diagonals – [in] host or device pointer to diagonal elements
diagonalsDataType – [in] data type of diagonals
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
nControls – [in] the number of control bits
extraWorkspaceSizeInBytes – [out] extra workspace size
custatevecApplyGeneralizedPermutationMatrix
¶
-
custatevecStatus_t custatevecApplyGeneralizedPermutationMatrix(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecIndex_t *permutation, const void *diagonals, cudaDataType_t diagonalsDataType, const int32_t adjoint, const int32_t *targets, const uint32_t nTargets, const int32_t *controls, const int32_t *controlBitValues, const uint32_t nControls, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Apply generalized permutation matrix.
This function applies the generalized permutation matrix.
The generalized permutation matrix, \(A\), is expressed as \(A = DP\), where \(D\) and \(P\) are diagonal and permutation matrices, respectively.
The permutation matrix, \(P\), is specified as a permutation table which is an array of custatevecIndex_t and passed to the
permutation
argument.The diagonal matrix, \(D\), is specified as an array of diagonal elements. The length of both arrays is \( 2^{{\text nTargets}} \). The
diagonalsDataType
argument specifies the type of diagonal elements.Below is the table of combinations of
svDataType
anddiagonalsDataType
arguments available in this version.svDataType
diagonalsDataType
CUDA_C_64F
CUDA_C_64F
CUDA_C_32F
CUDA_C_64F
CUDA_C_32F
CUDA_C_32F
permutation
argument, \(P\) is treated as an identity matrix, thus, only the diagonal matrix \(D\) is applied. Likewise, if a null pointer is passed to thediagonals
argument, \(D\) is treated as an identity matrix, and only the permutation matrix \(P\) is applied.The permutation argument should hold integers in [0, \( 2^{nTargets} \)). An integer should appear only once, otherwise the behavior of this function is undefined.
The
permutation
anddiagonals
arguments should not be null at the same time. In this case, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nTargets
ornIndexBits
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize().A null pointer can be passed to the
extraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.Note
In this version, custatevecApplyGeneralizedPermutationMatrix() does not return error if an invalid
permutation
argument is specified.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
permutation – [in] host or device pointer to a permutation table
diagonals – [in] host or device pointer to diagonal elements
diagonalsDataType – [in] data type of diagonals
adjoint – [in] apply adjoint of generalized permutation matrix
targets – [in] pointer to a host array of target bits
nTargets – [in] the number of target bits
controls – [in] pointer to a host array of control bits
controlBitValues – [in] pointer to a host array of control bit values
nControls – [in] the number of control bits
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size
Measurement¶
Measurement on Z-bases¶
Let us consider the measurement of an \(nIndexBits\)-qubit state vector \(sv\) on an \(nBasisBits\)-bit Z product basis \(basisBits\).
The sums of squared absolute values of state vector elements on the Z product basis, \(abs2sum0\) and \(abs2sum1\), are obtained by the followings:
Therefore, probabilities to obtain parity 0 and 1 are expressed in the following expression:
Depending on the measurement result, the state vector is collapsed. If parity is equal to 0, we obtain the following vector:
and if parity is equal to 1, we obtain the following vector:
where \(norm\) is the normalization factor.
Use case¶
We can measure by custatevecMeasureOnZBasis()
as follows:
// measure on a Z basis
custatevecMeasureOnZBasis(
handle, sv, svDataType, nIndexBits, &parity, basisBits, nBasisBits,
randnum, collapse);
The operation is equivalent to the following:
// compute the sums of squared absolute values of state vector elements
// on a Z product basis
double abs2sum0, abs2sum1;
custatevecAbs2SumOnZBasis(
handle, sv, svDataType, nIndexBits, &abs2sum0, &abs2sum1, basisBits,
nBasisBits);
// [User] compute parity and norm
double abs2sum = abs2sum0 + abs2sum1;
int parity = (randnum * abs2sum < abs2sum0) ? 0 : 1;
double norm = (parity == 0) ? abs2sum0 : abs2sum1;
// collapse if necessary
switch (collapse) {
case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO:
custatevecCollapseOnZBasis(
handle, sv, svDataType, nIndexBits, parity, basisBits, nBasisBits,
norm);
break; /* collapse */
case CUSTATEVEC_COLLAPSE_NONE:
break; /* Do nothing */
API reference¶
custatevecAbs2SumOnZBasis
¶
-
custatevecStatus_t custatevecAbs2SumOnZBasis(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *abs2sum0, double *abs2sum1, const int32_t *basisBits, const uint32_t nBasisBits)¶
Calculates the sum of squared absolute values on a given Z product basis.
This function calculates sums of squared absolute values on a given Z product basis. If a null pointer is specified to
abs2sum0
orabs2sum1
, the sum for the corresponding value is not calculated. Since the sum of (abs2sum0
+abs2sum1
) is identical to the norm of the state vector, one can calculate the probability where parity == 0 as (abs2sum0
/ (abs2sum0
+abs2sum1
)).- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
abs2sum0 – [out] pointer to a host or device variable to store the sum of squared absolute values for parity == 0
abs2sum1 – [out] pointer to a host or device variable to store the sum of squared absolute values for parity == 1
basisBits – [in] pointer to a host array of Z-basis index bits
nBasisBits – [in] the number of basisBits
custatevecCollapseOnZBasis
¶
-
custatevecStatus_t custatevecCollapseOnZBasis(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int32_t parity, const int32_t *basisBits, const uint32_t nBasisBits, double norm)¶
Collapse state vector on a given Z product basis.
This function collapses state vector on a given Z product basis. The state elements that match the parity argument are scaled by a factor specified in the
norm
argument. Other elements are set to zero.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
parity – [in] parity, 0 or 1
basisBits – [in] pointer to a host array of Z-basis index bits
nBasisBits – [in] the number of Z basis bits
norm – [in] normalization factor
custatevecMeasureOnZBasis
¶
-
custatevecStatus_t custatevecMeasureOnZBasis(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *parity, const int32_t *basisBits, const uint32_t nBasisBits, const double randnum, enum custatevecCollapseOp_t collapse)¶
Measurement on a given Z-product basis.
This function does measurement on a given Z product basis. The measurement result is the parity of the specified Z product basis. At least one basis bit should be specified, otherwise this function fails.
If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measurement result without collapsing the state vector. If CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseOnZBasis() does.
If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
parity – [out] parity, 0 or 1
basisBits – [in] pointer to a host array of Z basis bits
nBasisBits – [in] the number of Z basis bits
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation
Qubit Measurement¶
Assume that we measure an \(nIndexBits\)-qubits state vector \(sv\) with a \(bitOrderingLen\)-bits bit string \(bitOrdering\).
The sums of squared absolute values of state vector elements are obtained by the following:
where \(idx = b_{BitOrderingLen-1}\cdots b_1 b_0\), \(i = b_{bitOrdering[BitOrderingLen-1]} \cdots b_{bitOrdering[1]} b_{bitOrdering[0]}\), \(b_p \in \{0, 1\}\).
Therefore, probability to obtain the \(idx\)-th pattern of bits are expressed in the following expression:
Depending on the measurement result, the state vector is collapsed.
If \(idx\) satisfies \((idx \ \& \ bitString) = idx\), we obtain \(sv[idx] = \dfrac{1}{\sqrt{norm}} sv[idx]\). Otherwise, \(sv[idx] = 0\), where \(norm\) is the normalization factor.
Use case¶
We can measure by custatevecBatchMeasure()
as follows:
// measure with a bit string
custatevecBatchMeasure(
handle, sv, svDataType, nIndexBits, bitString, bitOrdering, bitStringLen,
randnum, collapse);
The operation is equivalent to the following:
// compute the sums of squared absolute values of state vector elements
int maskLen = 0;
int* maskBitString = nullptr;
int* maskOrdering = nullptr;
custatevecAbs2SumArray(
handle, sv, svDataType, nIndexBits, abs2Sum, bitOrdering, bitOrderingLen,
maskBitString, maskOrdering, maskLen);
// [User] compute a cumulative sum and choose bitString by a random number
// collapse if necessary
switch (collapse) {
case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO:
custatevecCollapseByBitString(
handle, sv, svDataType, nIndexBits, bitString, bitOrdering,
bitStringLen, norm);
break; /* collapse */
case CUSTATEVEC_COLLAPSE_NONE:
break; /* Do nothing */
For batched state vectors, custatevecAbs2SumArrayBatched()
, custatevecCollapseByBitStringBatched()
, and custatevecMeasureBatched()
are available.
Please refer to batched state vectors for the overview of batched state vector simulations.
For multi-GPU computations, custatevecBatchMeasureWithOffset()
is available.
This function works on one device, and users are required to compute
the cumulative array of squared absolute values of state vector elements beforehand.
// The state vector is divided to nSubSvs sub state vectors.
// each state vector has its own ordinal and nLocalBits index bits.
// The ordinals of sub state vectors correspond to the extended index bits.
// In this example, all the local qubits are measured and collapsed.
// get abs2sum for each sub state vector
double abs2SumArray[nSubSvs];
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecAbs2SumArray(
handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr,
0, nullptr, nullptr, 0);
}
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
cudaDeviceSynchronize();
}
// get cumulative array
double cumulativeArray[nSubSvs + 1];
cumulativeArray[0] = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cumulativeArray[iSv + 1] = cumulativeArray[iSv] + abs2SumArray[iSv];
}
// measurement
for (int iSv = 0; iSv < nSubSvs; iSv++) {
// detect which sub state vector will be used for measurement.
if (cumulativeArray[iSv] <= randnum && randnum < cumulativeArray[iSv + 1]) {
double norm = cumulativeArray[nSubSvs];
double offset = cumulativeArray[iSv];
cudaSetDevice(devices[iSv]);
// measure local qubits. Here the state vector will not be collapsed.
// Only local qubits can be included in bitOrdering and bitString arguments.
// That is, bitOrdering = {0, 1, 2, ..., nLocalBits - 1} and
// bitString will store values of local qubits as an array of integers.
custatevecBatchMeasureWithOffset(
handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering,
bitStringLen, randnum, CUSTATEVEC_COLLAPSE_NONE, offset, norm);
}
}
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
cudaDeviceSynchronize();
}
// get abs2Sum after collapse
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecAbs2SumArray(
handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr,
0, bitString, bitOrdering, bitStringLen);
}
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
cudaDeviceSynchronize();
}
// get norm after collapse
double norm = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
norm += abs2SumArray[iSv];
}
// collapse sub state vectors
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecCollapseByBitString(
handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering,
bitStringLen, norm);
}
// destroy handle
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecDestroy(handle[iSv]);
}
Please refer to NVIDIA/cuQuantum repository for further detail.
API reference¶
custatevecAbs2SumArray
¶
-
custatevecStatus_t custatevecAbs2SumArray(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *abs2sum, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen)¶
Calculate abs2sum array for a given set of index bits.
Calculates an array of sums of squared absolute values of state vector elements. The abs2sum array can be on host or device. The index bit ordering abs2sum array is specified by the
bitOrdering
andbitOrderingLen
arguments. Unspecified bits are folded (summed up).The
maskBitString
,maskOrdering
andmaskLen
arguments set bit mask in the state vector index. The abs2sum array is calculated by using state vector elements whose indices match the mask bit string. If themaskLen
argument is 0, null pointers can be specified to themaskBitString
andmaskOrdering
arguments, and all state vector elements are used for calculation.By definition, bit positions in
bitOrdering
andmaskOrdering
arguments should not overlap.The empty
bitOrdering
can be specified to calculate the norm of state vector. In this case, 0 is passed to thebitOrderingLen
argument and thebitOrdering
argument can be a null pointer.Note
Since the size of abs2sum array is proportional to \( 2^{bitOrderingLen} \) , the max length of
bitOrdering
depends on the amount of available memory andmaskLen
.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
abs2sum – [out] pointer to a host or device array of sums of squared absolute values
bitOrdering – [in] pointer to a host array of index bit ordering
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array for a bit string to specify mask
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
custatevecCollapseByBitString
¶
-
custatevecStatus_t custatevecCollapseByBitString(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, double norm)¶
Collapse state vector to the state specified by a given bit string.
This function collapses state vector to the state specified by a given bit string. The state vector elements specified by the
bitString
,bitOrdering
andbitStringLen
arguments are normalized by thenorm
argument. Other elements are set to zero.At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
bitString – [in] pointer to a host array of bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bit string
norm – [in] normalization constant
custatevecBatchMeasure
¶
-
custatevecStatus_t custatevecBatchMeasure(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, const double randnum, enum custatevecCollapseOp_t collapse)¶
Batched single qubit measurement.
This function does batched single qubit measurement and returns a bit string. The
bitOrdering
argument specifies index bits to be measured. The measurement result is stored inbitString
in the ordering specified by thebitOrdering
argument.If CUSTATEVEC_COLLAPSE_NONE is specified for the
collapse
argument, this function only returns the measured bit string without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseByBitString() does.If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.
Note
This API is for measuring a single state vector. For measuring batched state vectors, please use custatevecMeasureBatched(), whose arguments are passed in a different convention.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
bitString – [out] pointer to a host array of measured bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation
custatevecAbs2SumArrayBatched
¶
-
custatevecStatus_t custatevecAbs2SumArrayBatched(custatevecHandle_t handle, const void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, double *abs2sumArrays, const custatevecIndex_t abs2sumArrayStride, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const custatevecIndex_t *maskBitStrings, const int32_t *maskOrdering, const uint32_t maskLen)¶
Calculate batched abs2sum array for a given set of index bits.
The batched version of custatevecAbs2SumArray() that calculates a batch of arrays that holds sums of squared absolute values from batched state vectors.
State vectors are placed on a single contiguous device memory chunk. The
svStride
argument specifies the distance between two adjacent state vectors. Thus,svStride
should be equal to or larger than the state vector size.The computed sums of squared absolute values are output to the
abs2sumArrays
which is a contiguous memory chunk. Theabs2sumArrayStride
specifies the distance between adjacent two abs2sum arrays. The batched abs2sum arrays can be on host or device. The index bit ordering the abs2sum array in the batch is specified by thebitOrdering
andbitOrderingLen
arguments. Unspecified bits are folded (summed up).The
maskBitStrings
,maskOrdering
andmaskLen
arguments specify bit mask to for the index bits of batched state vectors. The abs2sum array is calculated by using state vector elements whose indices match the specified mask bit strings. ThemaskBitStrings
argument specifies an array of mask values as integer bit masks that are applied for the state vector index.If the
maskLen
argument is 0, null pointers can be specified to themaskBitStrings
andmaskOrdering
arguments. In this case, all state vector elements are used without masks to compute the squared sum of absolute values.By definition, bit positions in
bitOrdering
andmaskOrdering
arguments should not overlap.The empty
bitOrdering
can be specified to calculate the norm of state vector. In this case, 0 is passed to thebitOrderingLen
argument and thebitOrdering
argument can be a null pointer.Note
In this version, this API does not return any errors even if the
maskBitStrings
argument contains invalid bit strings. However, when applicable an error message would be printed to stdout.- Parameters
handle – [in] the handle to the cuStateVec library
batchedSv – [in] batch of state vectors
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits
nSVs – [in] the number of state vectors in a batch
svStride – [in] the stride of state vector
abs2sumArrays – [out] pointer to a host or device array of sums of squared absolute values
abs2sumArrayStride – [in] the distance between consequence abs2sumArrays
bitOrdering – [in] pointer to a host array of index bit ordering
bitOrderingLen – [in] the length of bitOrdering
maskBitStrings – [in] pointer to a host or device array of mask bit strings
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
custatevecCollapseByBitStringBatchedGetWorkspaceSize
¶
-
custatevecStatus_t custatevecCollapseByBitStringBatchedGetWorkspaceSize(custatevecHandle_t handle, const uint32_t nSVs, const custatevecIndex_t *bitStrings, const double *norms, size_t *extraWorkspaceSizeInBytes)¶
This function gets the required workspace size for custatevecCollapseByBitStringBatched().
This function returns the required extra workspace size to execute custatevecCollapseByBitStringBatched().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required.Note
The
bitStrings
andnorms
arrays are of the same sizenSVs
and can reside on either the host or the device, but their locations must remain the same when invoking custatevecCollapseByBitStringBatched(), or the computed workspace size may become invalid and lead to undefined behavior.- Parameters
handle – [in] the handle to the cuStateVec context
nSVs – [in] the number of batched state vectors
bitStrings – [in] pointer to an array of bit strings, on either host or device
norms – [in] pointer to an array of normalization constants, on either host or device
extraWorkspaceSizeInBytes – [out] workspace size
custatevecCollapseByBitStringBatched
¶
-
custatevecStatus_t custatevecCollapseByBitStringBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, const custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *norms, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Collapse the batched state vectors to the state specified by a given bit string.
This function collapses all of the state vectors in a batch to the state specified by a given bit string. Batched state vectors are allocated in single device memory chunk with the stride specified by the
svStride
argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by thenSVs
argument.The i-th state vector’s elements, as specified by the i-th
bitStrings
element and thebitOrdering
andbitStringLen
arguments, are normalized by the i-thnorms
element. Other state vector elements are set to zero.At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.
Note that
bitOrdering
andbitStringLen
are applicable to all state vectors in the batch, while thebitStrings
andnorms
arrays are of the same sizenSVs
and can reside on either the host or the device.The
bitStrings
argument should hold integers in [0, \( 2^\text{bitStringLen} \)).This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nSVs
and/ornIndexBits
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecCollapseByBitStringBatchedGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.Note
In this version, custatevecCollapseByBitStringBatched() does not return error if an invalid
bitStrings
ornorms
argument is specified. However, when applicable an error message would be printed to stdout.Note
Unlike the non-batched version (custatevecCollapseByBitString()), in this batched version
bitStrings
are stored as an array with element type custatevecIndex_t; that is, each element is an integer representing a bit string in the binary form. This usage is in line with the custatevecSamplerSample() API. See the Bit Ordering section for further detail.The
bitStrings
andnorms
arrays are of the same sizenSVs
and can reside on either the host or the device, but their locations must remain the same when invoking custatevecCollapseByBitStringBatchedGetWorkspaceSize(), or the computed workspace size may become invalid and lead to undefined behavior.- Parameters
handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vectors
nIndexBits – [in] the number of index bits of the state vectors
nSVs – [in] the number of batched state vectors
svStride – [in] distance of two consecutive state vectors
bitStrings – [in] pointer to an array of bit strings, on either host or device
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bit string
norms – [in] pointer to an array of normalization constants on either host or device
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] size of the extra workspace
custatevecMeasureBatched
¶
-
custatevecStatus_t custatevecMeasureBatched(custatevecHandle_t handle, void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *randnums, enum custatevecCollapseOp_t collapse)¶
Single qubit measurements for batched state vectors.
This function measures bit strings of batched state vectors. The
bitOrdering
andbitStringLen
arguments specify an integer array of index bit positions to be measured. The measurement results are returned tobitStrings
which is a 64-bit integer array of 64-bit integer bit masks.Ex. When
bitOrdering
= {3, 1} is specified, this function measures two index bits. The 0-th bit inbitStrings
elements represents the measurement outcomes of the index bit 3, and the 1st bit represents those of the 1st index bit.Batched state vectors are given in a single contiguous memory chunk where state vectors are placed at the distance specified by
svStride
. ThesvStride
is expressed in the number of elements.The
randnums
stores random numbers used for measurements. The number of random numbers is identical tonSVs
, and values should be in [0, 1). Any random number not in this range, the value is clipped to [0, 1).If CUSTATEVEC_COLLAPSE_NONE is specified for the
collapse
argument, this function only returns the measured bit strings without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses state vectors. After collapse of state vectors, the norms of all state vectors will be 1.Note
This API is for measuring batched state vectors. For measuring a single state vector, custatevecBatchMeasure() is also available, whose arguments are passed in a different convention.
- Parameters
handle – [in] the handle to the cuStateVec library
batchedSv – [inout] batched state vectors
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
nSVs – [in] the number of state vectors in the batched state vector
svStride – [in] the distance between state vectors in the batch
bitStrings – [out] pointer to a host or device array of measured bit strings
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnums – [in] pointer to a host or device array of random numbers.
collapse – [in] Collapse operation
custatevecBatchMeasureWithOffset
¶
-
custatevecStatus_t custatevecBatchMeasureWithOffset(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, int32_t *bitString, const int32_t *bitOrdering, const uint32_t bitStringLen, const double randnum, enum custatevecCollapseOp_t collapse, const double offset, const double abs2sum)¶
Batched single qubit measurement for partial vector.
This function does batched single qubit measurement and returns a bit string. The
bitOrdering
argument specifies index bits to be measured. The measurement result is stored inbitString
in the ordering specified by thebitOrdering
argument.If CUSTATEVEC_COLLAPSE_NONE is specified for the collapse argument, this function only returns the measured bit string without collapsing the state vector. When CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO is specified, this function collapses the state vector as custatevecCollapseByBitString() does.
This function assumes that
sv
is partial state vector and drops some most significant bits. Prefix sums for lower indices and the entire state vector must be provided asoffset
andabs2sum
, respectively. Whenoffset
==abs2sum
== 0, this function behaves in the same way as custatevecBatchMeasure().If a random number is not in [0, 1), this function returns CUSTATEVEC_STATUS_INVALID_VALUE. At least one basis bit should be specified, otherwise this function returns CUSTATEVEC_STATUS_INVALID_VALUE.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] partial state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits
bitString – [out] pointer to a host array of measured bit string
bitOrdering – [in] pointer to a host array of bit string ordering
bitStringLen – [in] length of bitString
randnum – [in] random number, [0, 1).
collapse – [in] Collapse operation
offset – [in] partial sum of squared absolute values
abs2sum – [in] sum of squared absolute values for the entire state vector
Expectation¶
Expectation via a Matrix¶
Expectation performs the following operation:
where \(\ket{\phi}\) is a state vector and \(A\) is a matrix or an observer.
The API for expectation custatevecComputeExpectation()
may require external workspace for large matrices,
and custatevecComputeExpectationGetWorkspaceSize()
provides the size of workspace.
If a device memory handler is set, custatevecComputeExpectationGetWorkspaceSize()
can be skipped.
custatevecComputeExpectationBatchedGetWorkspaceSize()
and custatevecComputeExpectationBatched()
can compute expectation values for batched state vectors.
Please refer to batched state vectors for the overview of batched state vector simulations.
Use case¶
// check the size of external workspace
custatevecComputeExpectationGetWorkspaceSize(
handle, svDataType, nIndexBits, matrix, matrixDataType, layout, nBasisBits, computeType,
&extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
void* extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// perform expectation
custatevecComputeExpectation(
handle, sv, svDataType, nIndexBits, expect, expectDataType, residualNorm,
matrix, matrixDataType, layout, basisBits, nBasisBits, computeType,
extraWorkspace, extraWorkspaceSizeInBytes);
API reference¶
custatevecComputeExpectationGetWorkspaceSize
¶
-
custatevecStatus_t custatevecComputeExpectationGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nBasisBits, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶
This function gets the required workspace size for custatevecComputeExpectation().
This function returns the size of the extra workspace required to execute custatevecComputeExpectation().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required.- Parameters
handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nBasisBits – [in] the number of target bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] size of the extra workspace
custatevecComputeExpectation
¶
-
custatevecStatus_t custatevecComputeExpectation(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, void *expectationValue, cudaDataType_t expectationDataType, double *residualNorm, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const int32_t *basisBits, const uint32_t nBasisBits, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Compute expectation of matrix observable.
This function calculates expectation for a given matrix observable. The acceptable values for the
expectationDataType
argument are CUDA_R_64F and CUDA_C_64F.The
basisBits
andnBasisBits
arguments specify the basis to calculate expectation. For thecomputeType
argument, the same combinations for custatevecApplyMatrix() are available.This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nBasisBits
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecComputeExpectationGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.Note
The
residualNorm
argument is not available in this version. If a matrix given by the matrix argument may not be a Hermitian matrix, please specify CUDA_C_64F to theexpectationDataType
argument and check the imaginary part of the calculated expectation value.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
expectationValue – [out] host pointer to a variable to store an expectation value
expectationDataType – [in] data type of expect
residualNorm – [out] result of matrix type test
matrix – [in] observable as matrix
matrixDataType – [in] data type of matrix
layout – [in] matrix memory layout
basisBits – [in] pointer to a host array of basis index bits
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] pointer to an extra workspace
extraWorkspaceSizeInBytes – [in] the size of extra workspace
Note
This function might be asynchronous with respect to host depending on the arguments.
Please use cudaStreamSynchronize
(for synchronization) or cudaStreamWaitEvent
(for establishing the stream order) with the stream set to the handle of the current device
before using the results stored in expectationValue
.
custatevecComputeExpectationBatchedGetWorkspaceSize
¶
-
custatevecStatus_t custatevecComputeExpectationBatchedGetWorkspaceSize(custatevecHandle_t handle, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, const custatevecIndex_t svStride, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nMatrices, const uint32_t nBasisBits, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶
This function gets the required workspace size for custatevecComputeExpectationBatched().
This function returns the size of the extra workspace required to execute custatevecComputeExpectationBatched().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required.- Parameters
handle – [in] the handle to the cuStateVec context
svDataType – [in] Data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] enumerator specifying the memory layout of matrix
nMatrices – [in] the number of matrices
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspaceSizeInBytes – [out] size of the extra workspace
custatevecComputeExpectationBatched
¶
-
custatevecStatus_t custatevecComputeExpectationBatched(custatevecHandle_t handle, const void *batchedSv, cudaDataType_t svDataType, const uint32_t nIndexBits, const uint32_t nSVs, custatevecIndex_t svStride, double2 *expectationValues, const void *matrices, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nMatrices, const int32_t *basisBits, const uint32_t nBasisBits, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Compute the expectation values of matrix observables for each of the batched state vectors.
This function computes expectation values for given matrix observables to each one of batched state vectors given by the
batchedSv
argument. Batched state vectors are allocated in single device memory chunk with the stride specified by thesvStride
argument. Each state vector size is \(2^\text{nIndexBits}\) and the number of state vectors is specified by thenSVs
argument.The
expectationValues
argument points to single memory chunk to output the expectation values. This API returns values in double precision (complex128) regardless of input data types. The output array size is ( \(\text{nMatrices} \times \text{nSVs}\) ) and its leading dimension isnMatrices
.The
matrices
argument is a host or device pointer of a 2-dimensional array for a square matrix. The size of matrices is ( \(\text{nMatrices} \times 2^\text{nBasisBits} \times 2^\text{nBasisBits}\) ) and the value type is specified by thematrixDataType
argument. Thelayout
argument specifies the matrix layout which can be in either row-major or column-major order.The
basisBits
andnBasisBits
arguments specify the basis to calculate expectation. For thecomputeType
argument, the same combinations for custatevecComputeExpectation() are available.This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nBasisBits
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The size of required extra workspace is obtained by calling custatevecComputeExpectationBatchedGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.- Parameters
handle – [in] the handle to the cuStateVec library
batchedSv – [in] batched state vector allocated in one continuous memory chunk on device
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
nSVs – [in] the number of state vectors
svStride – [in] distance of two consecutive state vectors
expectationValues – [out] pointer to a host array to store expectation values
matrices – [in] pointer to allocated matrices in one contiguous memory chunk on host or device
matrixDataType – [in] data type of matrices
layout – [in] matrix memory layout
nMatrices – [in] the number of matrices
basisBits – [in] pointer to a host array of basis index bits
nBasisBits – [in] the number of basis bits
computeType – [in] computeType of matrix multiplication
extraWorkspace – [in] pointer to an extra workspace
extraWorkspaceSizeInBytes – [in] the size of extra workspace
Expectation on Pauli Basis¶
cuStateVec API custatevecComputeExpectationsOnPauliBasis()
computes expectation values for a batch of Pauli strings.
Each observable can be expressed as follows:
Each matrix \(P_{\text{basisBits}[i]}\) can be one of the Pauli matrices \(I\), \(X\), \(Y\), and \(Z\), corresponding to the custatevecPauli_t
enums CUSTATEVEC_PAULI_I
, CUSTATEVEC_PAULI_X
, CUSTATEVEC_PAULI_Y
, and CUSTATEVEC_PAULI_Z
, respectively.
Also refer to custatevecPauli_t for details.
Use case¶
// calculate the norm and the expectations for Z(q1) and X(q0)Y(q2)
uint32_t nPauliOperatorArrays = 3;
custatevecPauli_t pauliOperators0[] = {}; // III
int32_t basisBits0[] = {};
custatevecPauli_t pauliOperators1[] = {CUSTATEVEC_PAULI_Z}; // IZI
int32_t basisBits1[] = {1};
custatevecPauli_t pauliOperators2[] = {CUSTATEVEC_PAULI_X, CUSTATEVEC_PAULI_Y}; // XIY
int32_t basisBits2[] = {0, 2};
const uint32_t nBasisBitsArray[] = {0, 1, 2};
const custatevecPauli_t*
pauliOperatorsArray[] = {pauliOperators0, pauliOperators1, pauliOperators2};
const int32_t *basisBitsArray[] = { basisBits0, basisBits1, basisBits2};
uint32_t nIndexBits = 3;
double expectationValues[nPauliOperatorArrays];
custatevecComputeExpectationsOnPauliBasis(
handle, sv, svDataType, nIndexBits, expectationValues,
pauliOperatorsArray, nPauliOperatorArrays,
basisBitsArray, nBasisBitsArray);
API reference¶
custatevecComputeExpectationsOnPauliBasis
¶
-
custatevecStatus_t custatevecComputeExpectationsOnPauliBasis(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, double *expectationValues, const custatevecPauli_t **pauliOperatorsArray, const uint32_t nPauliOperatorArrays, const int32_t **basisBitsArray, const uint32_t *nBasisBitsArray)¶
Calculate expectation values for a batch of (multi-qubit) Pauli operators.
This function calculates multiple expectation values for given sequences of Pauli operators by a single call.
A single Pauli operator sequence, pauliOperators, is represented by using an array of custatevecPauli_t. The basis bits on which these Pauli operators are acting are represented by an array of index bit positions. If no Pauli operator is specified for an index bit, the identity operator (CUSTATEVEC_PAULI_I) is implicitly assumed.
The length of pauliOperators and basisBits are the same and specified by nBasisBits.
The number of Pauli operator sequences is specified by the
nPauliOperatorArrays
argument.Multiple sequences of Pauli operators are represented in the form of arrays of arrays in the following manners:
The
pauliOperatorsArray
argument is an array for arrays of custatevecPauli_t.The
basisBitsArray
is an array of the arrays of basis bit positions.The
nBasisBitsArray
argument holds an array of the length of Pauli operator sequences and basis bit arrays.
Calculated expectation values are stored in a host buffer specified by the
expectationValues
argument of lengthnPauliOpeartorsArrays
.This function returns CUSTATEVEC_STATUS_INVALID_VALUE if basis bits specified for a Pauli operator sequence has duplicates and/or out of the range of [0,
nIndexBits
).This function accepts empty Pauli operator sequence to get the norm of the state vector.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] data type of the state vector
nIndexBits – [in] the number of index bits of the state vector
expectationValues – [out] pointer to a host array to store expectation values
pauliOperatorsArray – [in] pointer to a host array of Pauli operator arrays
nPauliOperatorArrays – [in] the number of Pauli operator arrays
basisBitsArray – [in] host array of basis bit arrays
nBasisBitsArray – [in] host array of the number of basis bits
Matrix property testing¶
The API custatevecTestMatrixType()
is available to check the properties of matrices.
If a matrix \(A\) is unitary, \(AA^{\dagger} = A^{\dagger}A = I\), where \(A^{\dagger}\) is the conjugate transpose of \(A\) and \(I\) is the identity matrix, respectively.
When CUSTATEVEC_MATRIX_TYPE_UNITARY
is given for its argument, this API computes the 1-norm \(||R||_1 = \sum{|r_{ij}|}\), where
\(R = AA^{\dagger} - I\).
This value will be approximately zero if \(A\) is unitary.
If a matrix \(A\) is Hermitian, \(A^{\dagger} = A\).
When CUSTATEVEC_MATRIX_TYPE_HERMITIAN
is given for its argument, this API computes the 2-norm \(||R||_2 = \sum{|r_{ij}|^2}\), where
\(R = (A - A^{\dagger}) / 2\).
This value will be approximately zero if \(A\) is Hermitian.
The API may require external workspace for large matrices,
and custatevecTestMatrixTypeGetWorkspaceSize()
provides the size of workspace.
If a device memory handler is set, it is not necessary to provide explicit workspace by users.
Use case¶
double residualNorm;
void* extraWorkspace = nullptr;
size_t extraWorkspaceSizeInBytes = 0;
// check the size of external workspace
custatevecTestMatrixTypeGetWorkspaceSize(
handle, matrixType, matrix, matrixDataType, layout,
nTargets, adjoint, computeType, &extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// execute testing
custatevecTestMatrixType(
handle, &residualNorm, matrixType, matrix, matrixDataType, layout,
nTargets, adjoint, computeType, extraWorkspace, extraWorkspaceSizeInBytes);
API reference¶
custatevecTestMatrixTypeGetWorkspaceSize
¶
-
custatevecStatus_t custatevecTestMatrixTypeGetWorkspaceSize(custatevecHandle_t handle, custatevecMatrixType_t matrixType, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nTargets, const int32_t adjoint, custatevecComputeType_t computeType, size_t *extraWorkspaceSizeInBytes)¶
Get extra workspace size for custatevecTestMatrixType()
This function gets the size of an extra workspace required to execute custatevecTestMatrixType().
extraWorkspaceSizeInBytes
will be set to 0 if no extra buffer is required.- Parameters
handle – [in] the handle to cuStateVec library
matrixType – [in] matrix type
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nTargets – [in] the number of target bits, up to 15
adjoint – [in] flag to control whether the adjoint of matrix is tested
computeType – [in] compute type
extraWorkspaceSizeInBytes – [out] workspace size
custatevecTestMatrixType
¶
-
custatevecStatus_t custatevecTestMatrixType(custatevecHandle_t handle, double *residualNorm, custatevecMatrixType_t matrixType, const void *matrix, cudaDataType_t matrixDataType, custatevecMatrixLayout_t layout, const uint32_t nTargets, const int32_t adjoint, custatevecComputeType_t computeType, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Test the deviation of a given matrix from a Hermitian (or Unitary) matrix.
This function tests if the type of a given matrix matches the type given by the
matrixType
argument.For tests for the unitary type, \( R = (AA^{\dagger} - I) \) is calculated where \( A \) is the given matrix. The sum of absolute values of \( R \) matrix elements is returned.
For tests for the Hermitian type, \( R = (M - M^{\dagger}) / 2 \) is calculated. The sum of squared absolute values of \( R \) matrix elements is returned.
This function may return CUSTATEVEC_STATUS_INSUFFICIENT_WORKSPACE for large
nTargets
. In such cases, theextraWorkspace
andextraWorkspaceSizeInBytes
arguments should be specified to provide extra workspace. The required size of an extra workspace is obtained by calling custatevecTestMatrixTypeGetWorkspaceSize(). A null pointer can be passed to theextraWorkspace
argument if no extra workspace is required. Also, if a device memory handler is set, theextraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.Note
The
nTargets
argument must be no more than 15 in this version. For largernTargets
, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.- Parameters
handle – [in] the handle to cuStateVec library
residualNorm – [out] host pointer, to store the deviation from certain matrix type
matrixType – [in] matrix type
matrix – [in] host or device pointer to a matrix
matrixDataType – [in] data type of matrix
layout – [in] enumerator specifying the memory layout of matrix
nTargets – [in] the number of target bits, up to 15
adjoint – [in] flag to control whether the adjoint of matrix is tested
computeType – [in] compute type
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size
Sampling¶
Sampling enables to obtain measurement results many times by using probability calculated from quantum states.
Use case¶
// create sampler and check the size of external workspace
custatevecSamplerCreate(
handle, sv, svDataType, nIndexBits, &sampler, nMaxShots,
&extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
void extraWorkspace = nullptr;
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(extraWorkspace, extraWorkspaceSizeInBytes);
// calculate cumulative abs2sum
custatevecSamplerPreprocess(
handle, sampler, extraWorkspace, extraWorkspaceSizeInBytes);
// [User] generate randnums, array of random numbers [0, 1) for sampling
...
// sample bit strings
custatevecSamplerSample(
handle, sampler, bitStrings, bitOrdering, bitStringLen, randnums, nShots,
output);
// deallocate the sampler
custatevecSamplerDestroy(sampler);
For multi-GPU computations, cuStateVec provides custatevecSamplerGetSquaredNorm()
and
custatevecSamplerApplySubSVOffset()
.
Users are required to calculate cumulative abs2sum array with the squared norm of
each sub state vector via custatevecSamplerGetSquaredNorm()
and provide its values
to the sampler descriptor via custatevecSamplerApplySubSVOffset()
.
// The state vector is divided to nSubSvs sub state vectors.
// each state vector has its own ordinal and nLocalBits index bits.
// The ordinals of sub state vectors correspond to the extended index bits.
// create sampler and check the size of external workspace
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecSamplerCreate(
handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &sampler[iSv], nMaxShots,
&extraWorkspaceSizeInBytes[iSv]);
}
// allocate external workspace if necessary
for (int iSv = 0; iSv < nSubSvs; iSv++) {
if (extraWorkspaceSizeInBytes[iSv] > 0) {
cudaSetDevice(devices[iSv]);
cudaMalloc(&extraWorkspace[iSv], extraWorkspaceSizeInBytes[iSv]);
}
}
// sample preprocess
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]) );
custatevecSampler_preprocess(
handle[iSv], sampler[iSv], extraWorkspace[iSv],
extraWorkspaceSizeInBytes[iSv]);
}
// get norm of the sub state vectors
double subNorms[nSubSvs];
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecSamplerGetSquaredNorm(
handle[iSv], sampler[iSv], &subNorms[iSv]);
}
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
cudaDeviceSynchronize();
}
// get cumulative array
double cumulativeArray[nSubSvs + 1];
cumulativeArray[0] = 0.0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cumulativeArray[iSv + 1] = cumulativeArray[iSv] + subNorms[iSv];
}
double norm = cumulativeArray[nSubSvs];
// apply offset and norm
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]) );
custatevecSamplerApplySubSVOffset(
handle[iSv], sampler[iSv], iSv, nSubSvs, cumulativeArray[iSv], norm);
}
// divide randnum array. randnums must be sorted in the ascending order.
int shotOffsets[nSubSvs + 1];
shotOffsets[0] = 0;
for (int iSv = 0; iSv < nSubSvs; iSv++) {
double* pos = std::lower_bound(randnums, randnums + nShots,
cumulativeArray[iSv + 1] / norm);
if (iSv == nSubSvs - 1) {
pos = randnums + nShots;
}
shotOffsets[iSv + 1] = pos - randnums;
}
// sample bit strings
for (int iSv = 0; iSv < nSubSvs; iSv++) {
int shotOffset = shotOffsets[iSv];
int nSubShots = shotOffsets[iSv + 1] - shotOffsets[iSv];
if (nSubShots > 0) {
cudaSetDevice(devices[iSv]);
custatevecSamplerSample(
handle[iSv], sampler[iSv], &bitStrings[shotOffset], bitOrdering,
bitStringLen, &randnums[shotOffset], nSubShots,
CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER);
}
}
for (int iSv = 0; iSv < nSubSvs; iSv++) {
cudaSetDevice(devices[iSv]);
custatevecSamplerDestroy(sampler[iSv]);
}
Please refer to NVIDIA/cuQuantum repository for further detail.
API reference¶
custatevecSamplerCreate
¶
-
custatevecStatus_t custatevecSamplerCreate(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecSamplerDescriptor_t *sampler, uint32_t nMaxShots, size_t *extraWorkspaceSizeInBytes)¶
Create sampler descriptor.
This function creates a sampler descriptor. If an extra workspace is required, its size is set to
extraWorkspaceSizeInBytes
.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] pointer to state vector
svDataType – [in] data type of state vector
nIndexBits – [in] the number of index bits of the state vector
sampler – [out] pointer to a new sampler descriptor
nMaxShots – [in] the max number of shots used for this sampler context
extraWorkspaceSizeInBytes – [out] workspace size
Note
The max value of nMaxShots
is \(2^{31} - 1\).
If the value exceeds the limit, custatevecSamplerCreate()
returns CUSTATEVEC_STATUS_INVALID_VALUE
.
custatevecSamplerPreprocess
¶
-
custatevecStatus_t custatevecSamplerPreprocess(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, void *extraWorkspace, const size_t extraWorkspaceSizeInBytes)¶
Preprocess the state vector for preparation of sampling.
This function prepares internal states of the sampler descriptor. If a device memory handler is set, the
extraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0. Otherwise, a pointer passed to theextraWorkspace
argument is associated to the sampler handle and should be kept during its life time. The size ofextraWorkspace
is obtained when custatevecSamplerCreate() is called.- Parameters
handle – [in] the handle to the cuStateVec library
sampler – [inout] the sampler descriptor
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] size of the extra workspace
custatevecSamplerGetSquaredNorm
¶
-
custatevecStatus_t custatevecSamplerGetSquaredNorm(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, double *norm)¶
Get the squared norm of the state vector.
This function returns the squared norm of the state vector. An intended use case is sampling with multiple devices. This API should be called after custatevecSamplerPreprocess(). Otherwise, the behavior of this function is undefined.
- Parameters
handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
norm – [out] the norm of the state vector
custatevecSamplerApplySubSVOffset
¶
-
custatevecStatus_t custatevecSamplerApplySubSVOffset(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, int32_t subSVOrd, uint32_t nSubSVs, double offset, double norm)¶
Apply the partial norm and norm to the state vector to the sample descriptor.
This function applies offsets assuming the given state vector is a sub state vector. An intended use case is sampling with distributed state vectors. The
nSubSVs
argument should be a power of 2 andsubSVOrd
should be less thannSubSVs
. Otherwise, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.- Parameters
handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
subSVOrd – [in] sub state vector ordinal
nSubSVs – [in] the number of sub state vectors
offset – [in] cumulative sum offset for the sub state vector
norm – [in] norm for all sub vectors
custatevecSamplerSample
¶
-
custatevecStatus_t custatevecSamplerSample(custatevecHandle_t handle, custatevecSamplerDescriptor_t sampler, custatevecIndex_t *bitStrings, const int32_t *bitOrdering, const uint32_t bitStringLen, const double *randnums, const uint32_t nShots, enum custatevecSamplerOutput_t output)¶
Sample bit strings from the state vector.
This function does sampling. The
bitOrdering
andbitStringLen
arguments specify bits to be sampled. Sampled bit strings are represented as an array of custatevecIndex_t and are stored to the host memory buffer that thebitStrings
argument points to.The
randnums
argument is an array of user-generated random numbers whose length isnShots
. The range of random numbers should be in [0, 1). A random number given by therandnums
argument is clipped to [0, 1) if its range is not in [0, 1).The
output
argument specifies the order of sampled bit strings:If CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER is specified, the order of sampled bit strings is the same as that in the
randnums
argument.If CUSTATEVEC_SAMPLER_OUTPUT_ASCENDING_ORDER is specified, bit strings are returned in the ascending order.
If you don’t need a particular order, choose CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER by default. (It may offer slightly better performance.)
This API should be called after custatevecSamplerPreprocess(). Otherwise, the behavior of this function is undefined. By calling custatevecSamplerApplySubSVOffset() prior to this function, it is possible to sample bits corresponding to the ordinal of sub state vector.
- Parameters
handle – [in] the handle to the cuStateVec library
sampler – [in] the sampler descriptor
bitStrings – [out] pointer to a host array to store sampled bit strings
bitOrdering – [in] pointer to a host array of bit ordering for sampling
bitStringLen – [in] the number of bits in bitOrdering
randnums – [in] pointer to an array of random numbers
nShots – [in] the number of shots
output – [in] the order of sampled bit strings
custatevecSamplerDestroy
¶
-
custatevecStatus_t custatevecSamplerDestroy(custatevecSamplerDescriptor_t sampler)¶
This function releases resources used by the sampler.
- Parameters
sampler – [in] the sampler descriptor
Accessor¶
An accessor extracts or updates state vector segments.
The APIs custatevecAccessorCreate()
and custatevecAccessorCreateView()
initialize an accessor and also return the size of an extra workspace (if needed by the APIs custatevecAccessorGet()
and custatevecAccessorSet()
to perform the copy).
The workspace must be bound to an accessor by custatevecAccessorSetExtraWorkspace()
, and the lifetime of the workspace must be as long as the accessor’s to cover the entire duration of the copy operation.
If a device memory handler is set, it is not necessary to provide explicit workspace by users.
The begin
and end
arguments in the Get/Set APIs correspond to the state vector elements’ indices such that elements within the specified range are copied.
Use case¶
Extraction¶
// create accessor and check the size of external workspace
custatevecAccessorCreateView(
handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen,
maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// set external workspace
custatevecAccessorSetExtraWorkspace(
handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes);
// get state vector elements
custatevecAccessorGet(
handle, &accessor, buffer, accessBegin, accessEnd);
// deallocate the accessor
custatevecAccessorDestroy(accessor);
Update¶
// create accessor and check the size of external workspace
custatevecAccessorCreate(
handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen,
maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes);
// allocate external workspace if necessary
if (extraWorkspaceSizeInBytes > 0)
cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);
// set external workspace
custatevecAccessorSetExtraWorkspace(
handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes);
// set state vector elements
custatevecAccessorSet(
handle, &accessor, buffer, 0, nSvSize);
// deallocate the accessor
custatevecAccessorDestroy(accessor);
API reference¶
custatevecAccessorCreate
¶
-
custatevecStatus_t custatevecAccessorCreate(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecAccessorDescriptor_t *accessor, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, size_t *extraWorkspaceSizeInBytes)¶
Create accessor to copy elements between the state vector and an external buffer.
Accessor copies state vector elements between the state vector and external buffers. During the copy, the ordering of state vector elements are rearranged according to the bit ordering specified by the
bitOrdering
argument.The state vector is assumed to have the default ordering: the LSB is the 0th index bit and the (N-1)th index bit is the MSB for an N index bit system. The bit ordering of the external buffer is specified by the
bitOrdering
argument. When 3 is given to thenIndexBits
argument and [1, 2, 0] to thebitOrdering
argument, the state vector index bits are permuted to specified bit positions. Thus, the state vector index is rearranged and mapped to the external buffer index as [0, 4, 1, 5, 2, 6, 3, 7].The
maskBitString
,maskOrdering
andmaskLen
arguments specify the bit mask for the state vector index being accessed. If themaskLen
argument is 0, themaskBitString
and/ormaskOrdering
arguments can be null.All bit positions [0,
nIndexBits
), should appear exactly once, either in thebitOrdering
or themaskOrdering
arguments. If a bit position does not appear in these arguments and/or there are overlaps of bit positions, this function returns CUSTATEVEC_STATUS_INVALID_VALUE.The extra workspace improves performance if the accessor is called multiple times with small external buffers placed on device. A null pointer can be specified to the
extraWorkspaceSizeInBytes
if the extra workspace is not necessary.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
accessor – [in] pointer to an accessor descriptor
bitOrdering – [in] pointer to a host array to specify the basis bits of the external buffer
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array to specify the mask values to limit access
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
extraWorkspaceSizeInBytes – [out] the required size of extra workspace
custatevecAccessorCreateView
¶
-
custatevecStatus_t custatevecAccessorCreateView(custatevecHandle_t handle, const void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, custatevecAccessorDescriptor_t *accessor, const int32_t *bitOrdering, const uint32_t bitOrderingLen, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, size_t *extraWorkspaceSizeInBytes)¶
Create accessor for the constant state vector.
This function is the same as custatevecAccessorCreate(), but only accepts the constant state vector.
- Parameters
handle – [in] the handle to the cuStateVec library
sv – [in] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
accessor – [in] pointer to an accessor descriptor
bitOrdering – [in] pointer to a host array to specify the basis bits of the external buffer
bitOrderingLen – [in] the length of bitOrdering
maskBitString – [in] pointer to a host array to specify the mask values to limit access
maskOrdering – [in] pointer to a host array for the mask ordering
maskLen – [in] the length of mask
extraWorkspaceSizeInBytes – [out] the required size of extra workspace
custatevecAccessorDestroy
¶
-
custatevecStatus_t custatevecAccessorDestroy(custatevecAccessorDescriptor_t accessor)¶
This function releases resources used by the accessor.
- Parameters
accessor – [in] the accessor descriptor
custatevecAccessorSetExtraWorkspace
¶
-
custatevecStatus_t custatevecAccessorSetExtraWorkspace(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Set the external workspace to the accessor.
This function sets the extra workspace to the accessor. The required size for extra workspace can be obtained by custatevecAccessorCreate() or custatevecAccessorCreateView(). if a device memory handler is set, the
extraWorkspace
can be set to null, and theextraWorkspaceSizeInBytes
can be set to 0.- Parameters
handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
extraWorkspace – [in] extra workspace
extraWorkspaceSizeInBytes – [in] extra workspace size
custatevecAccessorGet
¶
-
custatevecStatus_t custatevecAccessorGet(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, void *externalBuffer, const custatevecIndex_t begin, const custatevecIndex_t end)¶
Copy state vector elements to an external buffer.
This function copies state vector elements to an external buffer specified by the
externalBuffer
argument. During the copy, the index bit is permuted as specified by thebitOrdering
argument in custatevecAccessorCreate() or custatevecAccessorCreateView().The
begin
andend
arguments specify the range of state vector elements being copied. Both arguments have the bit ordering specified by thebitOrdering
argument.- Parameters
handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
externalBuffer – [out] pointer to a host or device buffer to receive copied elements
begin – [in] index in the permuted bit ordering for the first elements being copied to the state vector
end – [in] index in the permuted bit ordering for the last elements being copied to the state vector (non-inclusive)
custatevecAccessorSet
¶
-
custatevecStatus_t custatevecAccessorSet(custatevecHandle_t handle, custatevecAccessorDescriptor_t accessor, const void *externalBuffer, const custatevecIndex_t begin, const custatevecIndex_t end)¶
Set state vector elements from an external buffer.
This function sets complex numbers to the state vector by using an external buffer specified by the
externalBuffer
argument. During the copy, the index bit is permuted as specified by thebitOrdering
argument in custatevecAccessorCreate().The
begin
andend
arguments specify the range of state vector elements being set to the state vector. Both arguments have the bit ordering specified by thebitOrdering
argument.If a read-only accessor created by calling custatevecAccessorCreateView() is provided, this function returns CUSTATEVEC_STATUS_NOT_SUPPORTED.
- Parameters
handle – [in] the handle to the cuStateVec library
accessor – [in] the accessor descriptor
externalBuffer – [in] pointer to a host or device buffer of complex values being copied to the state vector
begin – [in] index in the permuted bit ordering for the first elements being copied from the state vector
end – [in] index in the permuted bit ordering for the last elements being copied from the state vector (non-inclusive)
Single-process qubit reordering¶
For single-process computations, cuStateVec provides custatevecSwapIndexBits()
API for single device and
custatevecMultiDeviceSwapIndexBits()
for multiple devices to reorder state vector elements.
Use case¶
single device¶
// This example uses 3 qubits.
const int nIndexBits = 3;
// swap 0th and 2nd qubits
const int nBitSwaps = 1;
const int2 bitSwaps[] = {{0, 2}}; // specify the qubit pairs
// swap the state vector elements only if 1st qubit is 1
const int maskLen = 1;
int maskBitString[] = {1}; // specify the values of mask qubits
int maskOrdering[] = {1}; // specify the mask qubits
// Swap index bit pairs.
// {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to
// {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}.
custatevecSwapIndexBits(handle, sv, svDataType, nIndexBits, bitSwaps, nBitSwaps,
maskBitString, maskOrdering, maskLen);
multiple devices¶
// This example uses 2 GPUs and each GPU stores 2-qubit sub state vector.
const int nGlobalIndexBits = 1;
const int nLocalIndexBits = 2;
const int nHandles = 1 << nGlobalIndexBits;
// Users are required to enable direct access on a peer device prior to the swap API call.
for (int i0 = 0; i0 < nHandles; i0++) {
cudaSetDevice(i0);
for (int i1 = 0; i1 < nHandles; i1++) {
if (i0 == i1)
continue;
cudaDeviceEnablePeerAccess(i1, 0);
}
}
cudaSetDevice(0);
// specify the type of device network topology to optimize the data transfer sequence.
// Here, devices are assumed to be connected via NVLink with an NVSwitch or
// PCIe device network with a single PCIe switch.
const custatevecDeviceNetworkType_t deviceNetworkType = CUSTATEVEC_DEVICE_NETWORK_TYPE_SWITCH;
// swap 0th and 2nd qubits
const int nIndexBitSwaps = 1;
const int2 indexBitSwaps[] = {{0, 2}}; // specify the qubit pairs
// swap the state vector elements only if 1st qubit is 1
const int maskLen = 1;
int maskBitString[] = {1}; // specify the values of mask qubits
int maskOrdering[] = {1}; // specify the mask qubits
// Swap index bit pairs.
// {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to
// {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}.
custatevecMultiDeviceSwapIndexBits(handles, nHandles, subSVs, svDataType,
nGlobalIndexBits, nLocalIndexBits, indexBitSwaps, nIndexBitSwaps,
maskBitString, maskOrdering, maskLen, deviceNetworkType);
API reference¶
custatevecSwapIndexBits
¶
-
custatevecStatus_t custatevecSwapIndexBits(custatevecHandle_t handle, void *sv, cudaDataType_t svDataType, const uint32_t nIndexBits, const int2 *bitSwaps, const uint32_t nBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen)¶
Swap index bits and reorder state vector elements in one device.
This function updates the bit ordering of the state vector by swapping the pairs of bit positions.
The state vector is assumed to have the default ordering: the LSB is the 0th index bit and the (N-1)th index bit is the MSB for an N index bit system. The
bitSwaps
argument specifies the swapped bit index pairs, whose values must be in the range [0,nIndexBits
).The
maskBitString
,maskOrdering
andmaskLen
arguments specify the bit mask for the state vector index being permuted. If themaskLen
argument is 0, themaskBitString
and/ormaskOrdering
arguments can be null.A bit position can be included in both
bitSwaps
andmaskOrdering
. When a masked bit is swapped, state vector elements whose original indices match the mask bit string are written to the permuted indices while other elements are not copied.- Parameters
handle – [in] the handle to the cuStateVec library
sv – [inout] state vector
svDataType – [in] Data type of state vector
nIndexBits – [in] the number of index bits of state vector
bitSwaps – [in] pointer to a host array of swapping bit index pairs
nBitSwaps – [in] the number of bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask
custatevecMultiDeviceSwapIndexBits
¶
-
custatevecStatus_t custatevecMultiDeviceSwapIndexBits(custatevecHandle_t *handles, const uint32_t nHandles, void **subSVs, const cudaDataType_t svDataType, const uint32_t nGlobalIndexBits, const uint32_t nLocalIndexBits, const int2 *indexBitSwaps, const uint32_t nIndexBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, const custatevecDeviceNetworkType_t deviceNetworkType)¶
Swap index bits and reorder state vector elements for multiple sub state vectors distributed to multiple devices.
This function updates the bit ordering of the state vector distributed in multiple devices by swapping the pairs of bit positions.
This function assumes the state vector is split into multiple sub state vectors and distributed to multiple devices to represent a (
nGlobalIndexBits
+nLocalIndexBits
) qubit system.The
handles
argument should receive cuStateVec handles created for all devices where sub state vectors are allocated. If two or more cuStateVec handles created for the same device are given, this function will return an error, CUSTATEVEC_STATUS_INVALID_VALUE. Thehandles
argument should contain a handle created on the current device, as all operations in this function will be ordered on the stream of the current device’s handle. Otherwise, this function returns an error, CUSTATEVEC_STATUS_INVALID_VALUE.Sub state vectors are specified by the
subSVs
argument as an array of device pointers. All sub state vectors are assumed to hold the same number of index bits specified by thenLocalIndexBits
. Thus, each sub state vectors holds (1 <<nLocalIndexBits
) state vector elements. The global index bits is identical to the index of sub state vectors. The number of sub state vectors is given as (1 <<nGlobalIndexBits
). The max value ofnGlobalIndexBits
is 5, which corresponds to 32 sub state vectors.The index bit of the distributed state vector has the default ordering: The index bits of the sub state vector are mapped from the 0th index bit to the (
nLocalIndexBits-1
)-th index bit. The global index bits are mapped from the (nLocalIndexBits
)-th bit to the (nGlobalIndexBits
+nLocalIndexBits
- 1)-th bit.The
indexBitSwaps
argument specifies the index bit pairs being swapped. Each index bit pair can be a pair of two global index bits or a pair of a global and a local index bit. A pair of two local index bits is not accepted. Please use custatevecSwapIndexBits() for swapping local index bits.The
maskBitString
,maskOrdering
andmaskLen
arguments specify the bit string mask that limits the state vector elements swapped during the call. Bits inmaskOrdering
can overlap index bits specified in theindexBitSwaps
argument. In such cases, the mask bit string is applied for the bit positions before index bit swaps. If themaskLen
argument is 0, themaskBitString
and/ormaskOrdering
arguments can be null.The
deviceNetworkType
argument specifies the device network topology to optimize the data transfer sequence. The following two network topologies are assumed:Switch network: devices connected via NVLink with an NVSwitch (ex. DGX A100 and DGX-2) or PCIe device network with a single PCIe switch
Full mesh network: all devices are connected by full mesh connections (ex. DGX Station V100/A100)
Note
Important notice This function assumes bidirectional GPUDirect P2P is supported and enabled by
cudaDeviceEnablePeerAccess()
between all devices where sub state vectors are allocated. If GPUDirect P2P is not enabled, the call tocustatevecMultiDeviceSwapIndexBits()
that accesses otherwise inaccessible device memory allocated in other GPUs would result in a segmentation fault.For the best performance, please use \(2^n\) number of devices and allocate one sub state vector in each device. This function allows the use of non- \(2^n\) number of devices, to allocate two or more sub state vectors on a device, or to allocate all sub state vectors on a single device to cover various hardware configurations. However, the performance is always the best when a single sub state vector is allocated on each \(2^n\) number of devices.
The copy on each participating device is enqueued on the CUDA stream bound to the corresponding handle via custatevecSetStream(). All CUDA calls before the call of this function are correctly ordered if these calls are issued on the streams set to
handles
. This function is asynchronously executed. Please usecudaStreamSynchronize()
(for synchronization) orcudaStreamWaitEvent()
(for establishing the stream order) with the stream set to the handle of the current device.- Parameters
handles – [in] pointer to a host array of custatevecHandle_t
nHandles – [in] the number of handles specified in the handles argument
subSVs – [inout] pointer to an array of sub state vectors
svDataType – [in] the data type of the state vector specified by the subSVs argument
nGlobalIndexBits – [in] the number of global index bits of distributed state vector
nLocalIndexBits – [in] the number of local index bits in sub state vector
indexBitSwaps – [in] pointer to a host array of index bit pairs being swapped
nIndexBitSwaps – [in] the number of index bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask
deviceNetworkType – [in] the device network topology
Multi-process qubit reordering¶
For multiple-process computations, cuStateVec provides APIs to schedule/reorder distributed state vector elements.
In addition, cuStateVec has custatevecCommunicator_t
, which wraps MPI libraries for inter-process communications.
Please refer to Distributed Index Bit Swap API for the overview and the detailed usages.
API reference¶
custatevecCommunicatorCreate
¶
-
custatevecStatus_t custatevecCommunicatorCreate(custatevecHandle_t handle, custatevecCommunicatorDescriptor_t *communicator, custatevecCommunicatorType_t communicatorType, const char *soname)¶
Create communicator.
This function creates a communicator instance.
The type of the communicator is specified by the
communicatorType
argument. By specifying CUSTATEVEC_COMMUNICATOR_TYPE_OPENMPI or CUSTATEVEC_COMMUNICATOR_TYPE_MPICH this function creates a communicator instance that internally uses Open MPI or MPICH, respectively. By specifying CUSTATEVEC_COMMUNICATOR_TYPE_EXTERNAL, this function loads a custom plugin that wraps an MPI library. The source code for the custom plugin is downloadable from NVIDIA/cuQuantum.The
soname
argument specifies the name of the shared library that will be used by the communicator instance.This function uses
dlopen()
to load the specified shared library. If Open MPI or MPICH library is directly linked to an application and CUSTATEVEC_COMMUNICATOR_TYPE_OPENMPI or CUSTATEVEC_COMMUNICATOR_TYPE_MPICH is specified to thecommunicatorType
argument, thesoname
argument should be set to NULL. Thus, function symbols are resolved by searching the functions loaded to the application at startup time.- Parameters
handle – [in] the handle to cuStateVec library
communicator – [out] a pointer to the communicator
communicatorType – [in] the communicator type
soname – [in] the shared object name
custatevecCommunicatorDestroy
¶
-
custatevecStatus_t custatevecCommunicatorDestroy(custatevecHandle_t handle, custatevecCommunicatorDescriptor_t communicator)¶
This function releases communicator.
- Parameters
handle – [in] the handle to cuStateVec library
communicator – [in] the communicator descriptor
custatevecDistIndexBitSwapSchedulerCreate
¶
-
custatevecStatus_t custatevecDistIndexBitSwapSchedulerCreate(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t *scheduler, const uint32_t nGlobalIndexBits, const uint32_t nLocalIndexBits)¶
Create distributed index bit swap scheduler.
This function creates a distributed index bit swap scheduler descriptor.
The local index bits are from the 0th index bit to the (
nLocalIndexBits-1
)-th index bit. The global index bits are mapped from the (nLocalIndexBits
)-th bit to the (nGlobalIndexBits
+nLocalIndexBits
- 1)-th bit.- Parameters
handle – [in] the handle to cuStateVec library
scheduler – [out] a pointer to a batch swap scheduler
nGlobalIndexBits – [in] the number of global index bits
nLocalIndexBits – [in] the number of local index bits
custatevecDistIndexBitSwapSchedulerDestroy
¶
-
custatevecStatus_t custatevecDistIndexBitSwapSchedulerDestroy(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler)¶
This function releases distributed index bit swap scheduler.
- Parameters
handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to the batch swap scheduler to destroy
custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps
¶
-
custatevecStatus_t custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler, const int2 *indexBitSwaps, const uint32_t nIndexBitSwaps, const int32_t *maskBitString, const int32_t *maskOrdering, const uint32_t maskLen, uint32_t *nSwapBatches)¶
Set index bit swaps to distributed index bit swap scheduler.
This function sets index bit swaps to the distributed index bit swap scheduler and computes the number of necessary batched data transfers for the given index bit swaps.
The index bit of the distributed state vector has the default ordering: The index bits of the sub state vector are mapped from the 0th index bit to the (
nLocalIndexBits-1
)-th index bit. The global index bits are mapped from the (nLocalIndexBits
)-th bit to the (nGlobalIndexBits
+nLocalIndexBits
- 1)-th bit.The
indexBitSwaps
argument specifies the index bit pairs being swapped. Each index bit pair can be a pair of two global index bits or a pair of a global and a local index bit. A pair of two local index bits is not accepted. Please use custatevecSwapIndexBits() for swapping local index bits.The
maskBitString
,maskOrdering
andmaskLen
arguments specify the bit string mask that limits the state vector elements swapped during the call. Bits inmaskOrdering
can overlap index bits specified in theindexBitSwaps
argument. In such cases, the mask bit string is applied for the bit positions before index bit swaps. If themaskLen
argument is 0, themaskBitString
and/ormaskOrdering
arguments can be null.The returned value by the
nSwapBatches
argument represents the number of loops required to complete index bit swaps and is used in later stages.- Parameters
handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to batch swap scheduler descriptor
indexBitSwaps – [in] pointer to a host array of index bit pairs being swapped
nIndexBitSwaps – [in] the number of index bit swaps
maskBitString – [in] pointer to a host array to mask output
maskOrdering – [in] pointer to a host array to specify the ordering of maskBitString
maskLen – [in] the length of mask
nSwapBatches – [out] the number of batched data transfers
custatevecDistIndexBitSwapSchedulerGetParameters
¶
-
custatevecStatus_t custatevecDistIndexBitSwapSchedulerGetParameters(custatevecHandle_t handle, custatevecDistIndexBitSwapSchedulerDescriptor_t scheduler, const int32_t swapBatchIndex, const int32_t orgSubSVIndex, custatevecSVSwapParameters_t *parameters)¶
Get parameters to be set to the state vector swap worker.
This function computes parameters used for data transfers between sub state vectors. The value of the
swapBatchIndex
argument should be in range of [0,nSwapBatches
) wherenSwapBatches
is the number of loops obtained by the call to custatevecDistIndexBitSwapSchedulerSetIndexBitSwaps().The
parameters
argument returns the computed parameters for data transfer, which is set tocustatevecSVSwapWorker
by the call to custatevecSVSwapWorkerSetParameters().- Parameters
handle – [in] the handle to cuStateVec library
scheduler – [in] a pointer to batch swap scheduler descriptor
swapBatchIndex – [in] swap batch index for state vector swap parameters
orgSubSVIndex – [in] the index of the origin sub state vector to swap state vector segments
parameters – [out] a pointer to data transfer parameters
custatevecSVSwapWorkerCreate
¶
-
custatevecStatus_t custatevecSVSwapWorkerCreate(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t *svSwapWorker, custatevecCommunicatorDescriptor_t communicator, void *orgSubSV, int32_t orgSubSVIndex, cudaEvent_t orgEvent, cudaDataType_t svDataType, cudaStream_t stream, size_t *extraWorkspaceSizeInBytes, size_t *minTransferWorkspaceSizeInBytes)¶
Create state vector swap worker.
This function creates a custatevecSVSwapWorkerDescriptor_t that swaps/sends/receives state vector elements between multiple sub state vectors. The communicator specified as the
communicator
argument is used for inter-process communication, thus state vector elements are transferred between sub state vectors distributed to multiple processes and nodes.The created descriptor works on the device where the handle is created. The origin sub state vector specified by the
orgSubSV
argument should be allocated on the same device. The same applies to the event and the stream specified by theorgEvent
andstream
arguments respectively.There are two workspaces, extra workspace and data transfer workspace. The extra workspace has constant size and is used to keep the internal state of the descriptor. The data transfer workspace is used to stage state vector elements being transferred. Its minimum size is given by the
minTransferWorkspaceSizeInBytes
argument. Depending on the system, increasing the size of data transfer workspace can improve performance.If all the destination sub state vectors are specified by using custatevecSVSwapWorkerSetSubSVsP2P(), the
communicator
argument can be null. In this case, the internal CUDA calls are not serialized on the stream specified by thestream
argument. It’s the user’s responsibility to callcudaStreamSynchronize()
and global barrier such asMPI_Barrier()
in this order to complete all internal CUDA calls. This limitation will be fixed in a future version.If sub state vectors are distributed to multiple processes, the event should be created with the
cudaEventInterprocess
flag. Please refer to the CUDA Toolkit documentation for the details.- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [out] state vector swap worker
communicator – [in] a pointer to the MPI communicator
orgSubSV – [in] a pointer to a sub state vector
orgSubSVIndex – [in] the index of the sub state vector specified by the orgSubSV argument
orgEvent – [in] the event for synchronization with the peer worker
svDataType – [in] data type used by the state vector representation
stream – [in] a stream that is used to locally execute kernels during data transfers
extraWorkspaceSizeInBytes – [out] the size of the extra workspace needed
minTransferWorkspaceSizeInBytes – [out] the minimum-required size of the transfer workspace
custatevecSVSwapWorkerDestroy
¶
-
custatevecStatus_t custatevecSVSwapWorkerDestroy(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker)¶
This function releases the state vector swap worker.
- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
custatevecSVSwapWorkerSetExtraWorkspace
¶
-
custatevecStatus_t custatevecSVSwapWorkerSetExtraWorkspace(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void *extraWorkspace, size_t extraWorkspaceSizeInBytes)¶
Set extra workspace.
This function sets the extra workspace to the state vector swap worker. The required size for extra workspace can be obtained by custatevecSVSwapWorkerCreate().
The extra workspace should be set before calling custatevecSVSwapWorkerSetParameters().
- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
extraWorkspace – [in] pointer to the user-owned workspace
extraWorkspaceSizeInBytes – [in] size of the user-provided workspace
custatevecSVSwapWorkerSetTransferWorkspace
¶
-
custatevecStatus_t custatevecSVSwapWorkerSetTransferWorkspace(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void *transferWorkspace, size_t transferWorkspaceSizeInBytes)¶
Set transfer workspace.
This function sets the transfer workspace to the state vector swap worker instance. The minimum size for transfer workspace can be obtained by custatevecSVSwapWorkerCreate().
Depending on the system hardware configuration, larger size of the transfer workspace can improve the performance. The size specified by the
transferWorkspaceSizeInBytes
should a power of two number and should be equal to or larger than the value of theminTransferWorkspaceSizeInBytes
returned by the call to custatevecSVSwapWorkerCreate().- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
transferWorkspace – [in] pointer to the user-owned workspace
transferWorkspaceSizeInBytes – [in] size of the user-provided workspace
custatevecSVSwapWorkerSetSubSVsP2P
¶
-
custatevecStatus_t custatevecSVSwapWorkerSetSubSVsP2P(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, void **dstSubSVsP2P, const int32_t *dstSubSVIndicesP2P, cudaEvent_t *dstEvents, const uint32_t nDstSubSVsP2P)¶
Set sub state vector pointers accessible via GPUDirect P2P.
This function sets sub state vector pointers that are accessible by GPUDirect P2P from the device where the state vector swap worker works. The sub state vector pointers should be specified together with the sub state vector indices and events which are passed to custatevecSVSwapWorkerCreate() to create peer SV swap worker instances.
If sub state vectors are allocated in different processes, the sub state vector pointers and the events should be retrieved by using CUDA IPC.
- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
dstSubSVsP2P – [in] an array of pointers to sub state vectors that are accessed by GPUDirect P2P
dstSubSVIndicesP2P – [in] the sub state vector indices of sub state vector pointers specified by the dstSubSVsP2P argument
dstEvents – [in] events used to create peer workers
nDstSubSVsP2P – [in] the number of sub state vector pointers specified by the dstSubSVsP2P argument
custatevecSVSwapWorkerSetParameters
¶
-
custatevecStatus_t custatevecSVSwapWorkerSetParameters(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, const custatevecSVSwapParameters_t *parameters, int peer)¶
Set state vector swap parameters.
This function sets the parameters to swap state vector elements. The value of the
parameters
argument is retrieved by calling custatevecDistIndexBitSwapSchedulerGetParameters().The
peer
argument specifies the rank of the peer process that holds the destination sub state vector. The sub state vector index of the destination sub state vector is obtained from thedstSubSVIndex
member defined in custatevecSVSwapParameters_t.If all the sub state vectors are accessible by GPUDirect P2P and a null pointer is passed to the
communicator
argument when calling custatevecSVSwapWorkerCreate(), thepeer
argument is ignored.- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
parameters – [in] data transfer parameters
peer – [in] the peer process identifier of the data transfer
custatevecSVSwapWorkerExecute
¶
-
custatevecStatus_t custatevecSVSwapWorkerExecute(custatevecHandle_t handle, custatevecSVSwapWorkerDescriptor_t svSwapWorker, custatevecIndex_t begin, custatevecIndex_t end)¶
Execute the data transfer.
This function executes the transfer of state vector elements. The number of elements being transferred is obtained from the
transferSize
member in custatevecSVSwapParameters_t. Thebegin
andend
arguments specify the range, [begin
,end
), for elements being transferred.- Parameters
handle – [in] the handle to cuStateVec library
svSwapWorker – [in] state vector swap worker
begin – [in] the index to start transfer
end – [in] the index to end transfer
Sub state vector migration¶
For distributed state vector simulations on host/device memories, cuStateVec provides APIs to migrate distributed state vector elements. Please refer to the Sub State Vector Migration API for the overview and the detailed usages.
API reference¶
custatevecSubSVMigratorCreate
¶
-
custatevecStatus_t custatevecSubSVMigratorCreate(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t *migrator, void *deviceSlots, cudaDataType_t svDataType, int nDeviceSlots, int nLocalIndexBits)¶
Create sub state vector migrator descriptor.
This function creates a sub state vector migrator descriptor. Sub state vectors specified by the
deviceSlots
argument are allocated in one contiguous memory array and its size should be at least ( \(\text{nDeviceSlots} \times 2^\text{nLocalIndexBits}\)).- Parameters
handle – [in] the handle to the cuStateVec library
migrator – [out] pointer to a new migrator descriptor
deviceSlots – [in] pointer to sub state vectors on device
svDataType – [in] data type of state vector
nDeviceSlots – [in] the number of sub state vectors in deviceSlots
nLocalIndexBits – [in] the number of index bits of sub state vectors
custatevecSubSVMigratorDestroy
¶
-
custatevecStatus_t custatevecSubSVMigratorDestroy(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t migrator)¶
Destroy sub state vector migrator descriptor.
This function releases a sub state vector migrator descriptor.
- Parameters
handle – [in] the handle to the cuStateVec library
migrator – [inout] the migrator descriptor
custatevecSubSVMigratorMigrate
¶
-
custatevecStatus_t custatevecSubSVMigratorMigrate(custatevecHandle_t handle, custatevecSubSVMigratorDescriptor_t migrator, int deviceSlotIndex, const void *srcSubSV, void *dstSubSV, custatevecIndex_t begin, custatevecIndex_t end)¶
Sub state vector migration.
This function performs a sub state vector migration. The
deviceSlotIndex
argument specifies the index of the sub state vector to be transferred, and thesrcSubSV
anddstSubSV
arguments specify sub state vectors to be transferred from/to the sub state vector on device. In the current version,srcSubSV
anddstSubSV
must be arrays allocated on host memory and accessible from the device. If eithersrcSubSV
ordstSubSV
is a null pointer, the corresponding data transfer will be skipped. Thebegin
andend
arguments specify the range, [begin
,end
), for elements being transferred.- Parameters
handle – [in] the handle to the cuStateVec library
migrator – [in] the migrator descriptor
deviceSlotIndex – [in] the index to specify sub state vector to migrate
srcSubSV – [in] a pointer to a sub state vector that is migrated to deviceSlots
dstSubSV – [out] a pointer to a sub state vector that is migrated from deviceSlots
begin – [in] the index to start migration
end – [in] the index to end migration