cuStateVec Functions #################### .. role:: raw-html(raw) :format: html Library Management ****************** Handle Management API ===================== .. _custatevecCreate-label: :code:`custatevecCreate` ------------------------ .. doxygenfunction:: custatevecCreate ---- .. _custatevecDestroy-label: :code:`custatevecDestroy` ------------------------- .. doxygenfunction:: custatevecDestroy ---- .. _custatevecGetDefaultWorkspaceSize-label: :code:`custatevecGetDefaultWorkspaceSize` ----------------------------------------- .. doxygenfunction:: custatevecGetDefaultWorkspaceSize ---- .. _custatevecSetWorkspace-label: :code:`custatevecSetWorkspace` ------------------------------------- .. doxygenfunction:: custatevecSetWorkspace CUDA Stream Management API ========================== .. _custatevecSetStream-label: :code:`custatevecSetStream` --------------------------- .. doxygenfunction:: custatevecSetStream ---- .. _custatevecGetStream-label: :code:`custatevecGetStream` --------------------------- .. doxygenfunction:: custatevecGetStream Error Management API ==================== .. _custatevecGetErrorName-label: :code:`custatevecGetErrorName` ------------------------------ .. doxygenfunction:: custatevecGetErrorName ---- .. _custatevecGetErrorString-label: :code:`custatevecGetErrorString` -------------------------------- .. doxygenfunction:: custatevecGetErrorString Logger API ========== .. _custatevecLoggerSetCallback-label: :code:`custatevecLoggerSetCallback` ----------------------------------- .. doxygenfunction:: custatevecLoggerSetCallback ---- .. _custatevecLoggerSetCallbackData-label: :code:`custatevecLoggerSetCallbackData` --------------------------------------- .. doxygenfunction:: custatevecLoggerSetCallbackData ---- .. _custatevecLoggerSetFile-label: :code:`custatevecLoggerSetFile` ------------------------------- .. doxygenfunction:: custatevecLoggerSetFile ---- .. _custatevecLoggerOpenFile-label: :code:`custatevecLoggerOpenFile` -------------------------------- .. doxygenfunction:: custatevecLoggerOpenFile ---- .. _custatevecLoggerSetLevel-label: :code:`custatevecLoggerSetLevel` -------------------------------- .. doxygenfunction:: custatevecLoggerSetLevel ---- .. _custatevecLoggerSetMask-label: :code:`custatevecLoggerSetMask` ------------------------------- .. doxygenfunction:: custatevecLoggerSetMask ---- .. _custatevecLoggerForceDisable-label: :code:`custatevecLoggerForceDisable` ------------------------------------ .. doxygenfunction:: custatevecLoggerForceDisable Versioning API ============== .. _custatevecGetProperty-label: :code:`custatevecGetProperty` ----------------------------- .. doxygenfunction:: custatevecGetProperty ---- .. _custatevecGetVersion-label: :code:`custatevecGetVersion` ----------------------------- .. doxygenfunction:: custatevecGetVersion .. _cuStateVec memory management API: Memory Management API ===================== A *stream-ordered* memory allocator (or mempool for short) allocates/deallocates memory *asynchronously* from/to a mempool in a stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra- stream dependency. There are several well-implemented stream-ordered mempools available, such as ``cudaMemPool_t`` that is built-in at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool, see `here `_) and the RAPIDS Memory Manager (`RMM`_). For a detailed introduction, see the `NVIDIA Developer Blog`_. .. _RMM: https://github.com/rapidsai/rmm .. _NVIDIA Developer Blog: https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/ The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuStateVec can take care of most of the memory management for users. Below is an illustration of what can be done: .. code-block:: c++ MyMemPool pool = MyMemPool(); // kept alive for the entire process in real apps int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) { return reinterpret_cast(ctx)->alloc(ptr, size, stream); } int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) { return reinterpret_cast(ctx)->dealloc(ptr, size, stream); } // create a mem handler and fill in the required members for the library to use custatevecDeviceMemHandler_t handler; handler.ctx = reinterpret_cast(&pool); handler.device_alloc = my_alloc; handler.device_free = my_dealloc; memcpy(handler.name, std::string("my pool").c_str(), CUSTATEVEC_ALLOCATOR_NAME_LEN); // bind the handler to the library handle custatevecSetDeviceMemHandler(handle, &handler); /* ... use gate application as usual ... */ // User doesn’t compute the required sizes // User doesn’t query the workspace size (but one can if desired) // User doesn’t allocate memory! // User sets null pointer to indicate the library should draw memory from the user's pool; void* extraWorkspace = nullptr; size_t extraWorkspaceInBytes = 0; custatevecApplyMatrix( handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout, adjoint, targets, nTargets, controls, nControls, controlBitValues, computeType, extraWorkspace, extraWorkspaceSizeInBytes); // User doesn’t deallocate memory! As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations. In the current release, only a device mempool can be bound. .. _custatevecSetDeviceMemHandler-label: :code:`custatevecSetDeviceMemHandler` ------------------------------------- .. doxygenfunction:: custatevecSetDeviceMemHandler ---- .. _custatevecGetDeviceMemHandler-label: :code:`custatevecGetDeviceMemHandler` ------------------------------------- .. doxygenfunction:: custatevecGetDeviceMemHandler Gate Application **************** General Matrices ================ cuStateVec API `custatevecApplyMatrix` can apply a matrix representing a gate to a state vector. The API may require external workspace for large matrices, and `custatevecApplyMatrixGetWorkspaceSize` provides the size of workspace. If a device memory handler is set, `custatevecApplyMatrixGetWorkspaceSize` can be skipped. Use case -------- .. code-block:: cpp // check the size of external workspace custatevecApplyMatrixGetWorkspaceSize( handle, svDataType, nIndexBits, matrix, matrixDataType, layout, adjoint, nTargets, nControls, computeType, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary void* extraWorkspace = nullptr; if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // apply gate custatevecApplyMatrix( handle, sv, svDataType, nIndexBits, matrix, matrixDataType, layout, adjoint, targets, nTargets, controls, controlBitValues, nControls, computeType, extraWorkspace, extraWorkspaceSizeInBytes); API reference ------------- .. _custatevecApplyMatrixGetWorkspaceSize-label: :code:`custatevecApplyMatrixGetWorkspaceSize` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecApplyMatrixGetWorkspaceSize ---- .. _custatevecApplyMatrix-label: :code:`custatevecApplyMatrix` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecApplyMatrix Pauli Matrices ============== Exponential of a tensor product of Pauli matrices can be expressed as follows: .. math:: e^{i \theta \left( P_{target[0]} \otimes P_{target[1]} \otimes \cdots \otimes P_{target[nTargets-1]} \right)}. Matrix :math:`P_{target[i]}` can be either of Pauli matrices :math:`I`, :math:`X`, :math:`Y`, and :math:`Z`, which are corresponding to the `custatevecPauli_t` enums `CUSTATEVEC_PAULI_I`, `CUSTATEVEC_PAULI_X`, `CUSTATEVEC_PAULI_Y`, and `CUSTATEVEC_PAULI_Z`, respectively. Also refer to :ref:`custatevecPauli_t` for details. Use case -------- .. code-block:: cpp // apply exponential custatevecApplyPauliRotation( handle, sv, svDataType, nIndexBits, theta, paulis, targets, nTargets, controls, controlBitValues, nControls); API reference ------------- .. _custatevecApplyPauliRotation-label: :code:`custatevecApplyPauliRotation` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecApplyPauliRotation Generalized Permutation Matrices ================================ A generalized permutation matrix can be expressed as the multiplication of a permutation matrix :math:`P` and a diagonal matrix :math:`D`. For instance, we can decompose a 4 :math:`\times` 4 generalized permutation matrix :math:`A` as follows: .. math:: A = \left[ \begin{array}{cccc} 0 & 0 & a_0 & 0 \\ a_1 & 0 & 0 & 0 \\ 0 & 0 & 0 & a_2 \\ 0 & a_3 & 0 & 0 \end{array}\right] = DP , where .. math:: D = \left[ \begin{array}{cccc} a_0 & 0 & 0 & 0 \\ 0 & a_1 & 0 & 0 \\ 0 & 0 & a_2 & 0 \\ 0 & 0 & 0 & a_3 \end{array}\right], P = \left[ \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array}\right]. When :math:`P` is diagonal, the generalized permutation matrix is also diagonal. Similarly, when :math:`D` is the identity matrix, the generalized permutation matrix becomes a permutation matrix. The cuStateVec API `custatevecApplyGeneralizedPermutationMatrix` applies a generalized permutation matrix like :math:`A` to a state vector. The API may require extra workspace for large matrices, whose size can be queried using `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize`. If a device memory handler is set, `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize` can be skipped. Use case -------- .. code-block:: cpp // check the size of external workspace custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize( handle, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F, targets, nTargets, nControls, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary void* extraWorkspace = nullptr; if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // apply a generalized permutation matrix custatevecApplyGeneralizedPermutationMatrix( handle, d_sv, CUDA_C_64F, nIndexBits, permutation, diagonals, CUDA_C_64F, adjoint, targets, nTargets, controls, controlBitValues, nControls, extraWorkspace, extraWorkspaceSizeInBytes); The operation is equivalent to the following: .. code-block:: cpp // sv, sv_temp: the state vector and temporary buffer. int64_t sv_size = int64_t{1} << nIndexBits; for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++) { // The basis of sv_idx is converted to permutation basis to obtain perm_idx auto perm_idx = convertToPermutationBasis(sv_idx); // apply generalized permutation matrix if (adjoint == 0) sv_temp[sv_idx] = sv[permutation[perm_idx]] * diagonals[perm_idx]; else sv_temp[permutation[perm_idx]] = sv[sv_idx] * conj(diagonals[perm_idx]); } for (int64_t sv_idx = 0; sv_idx < sv_size; sv_idx++) sv[sv_idx] = sv_temp[sv_idx]; API reference ------------- .. _custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize-label: :code:`custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize ---- .. _custatevecApplyGeneralizedPermutationMatrix-label: :code:`custatevecApplyGeneralizedPermutationMatrix` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecApplyGeneralizedPermutationMatrix Measurement *********** Measurement on Z-bases ====================== Let us consider the measurement of an :math:`nIndexBits`-qubit state vector :math:`sv` on an :math:`nBasisBits`-bit Z product basis :math:`basisBits`. The sums of squared absolute values of state vector elements on the Z product basis, :math:`abs2sum0` and :math:`abs2sum1`, are obtained by the followings: .. math:: abs2sum0 &= \Bra{sv} \left\{ \dfrac{1}{2} \left( 1 + Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}, \\ abs2sum1 &= \Bra{sv} \left\{ \dfrac{1}{2} \left( 1 - Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}. Therefore, probabilities to obtain parity 0 and 1 are expressed in the following expression: .. math:: Pr(parity = 0) &= \dfrac{abs2sum0}{abs2sum0 + abs2sum1}, \\ Pr(parity = 1) &= \dfrac{abs2sum1}{abs2sum0 + abs2sum1}. Depending on the measurement result, the state vector is collapsed. If parity is equal to 0, we obtain the following vector: .. math:: \ket{sv} = \dfrac{1}{\sqrt{norm}} \left\{ \dfrac{1}{2} \left( 1 + Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}, and if parity is equal to 1, we obtain the following vector: .. math:: \ket{sv} = \dfrac{1}{\sqrt{norm}} \left\{ \dfrac{1}{2} \left( 1 - Z_{basisBits[0]} \otimes Z_{basisBits[1]} \otimes \cdots \otimes Z_{basisBits[nBasisBits-1]} \right) \right\} \Ket{sv}, where :math:`norm` is the normalization factor. Use case -------- We can measure by `custatevecMeasureOnZBasis` as follows: .. code-block:: cpp // measure on a Z basis custatevecMeasureOnZBasis( handle, sv, svDataType, nIndexBits, &parity, basisBits, nBasisBits, randnum, collapse); The operation is equivalent to the following: .. code-block:: cpp // compute the sums of squared absolute values of state vector elements // on a Z product basis double abs2sum0, abs2sum1; custatevecAbs2SumOnZBasis( handle, sv, svDataType, nIndexBits, &abs2sum0, &abs2sum1, basisBits, nBasisBits); // [User] compute parity and norm double abs2sum = abs2sum0 + abs2sum1; int parity = (randnum * abs2sum < abs2sum0) ? 0 : 1; double norm = (parity == 0) ? abs2sum0 : abs2sum1; // collapse if necessary switch (collapse) { case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO: custatevecCollapseOnZBasis( handle, sv, svDataType, nIndexBits, parity, basisBits, nBasisBits, norm); break; /* collapse */ case CUSTATEVEC_COLLAPSE_NONE: break; /* Do nothing */ API reference ------------- .. _custatevecAbs2SumOnZBasis-label: :code:`custatevecAbs2SumOnZBasis` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecAbs2SumOnZBasis ---- .. _custatevecCollapseOnZBasis-label: :code:`custatevecCollapseOnZBasis` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecCollapseOnZBasis ---- .. _custatevecMeasureOnZBasis-label: :code:`custatevecMeasureOnZBasis` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecMeasureOnZBasis .. _batchMeasureSection-label: Batched Single Qubit Measurement ================================ Assume that we measure an :math:`nIndexBits`-qubits state vector :math:`sv` with a :math:`bitOrderingLen`-bits bit string :math:`bitOrdering`. The sums of squared absolute values of state vector elements are obtained by the following: .. math:: abs2sum[idx] = \braket{sv|i}\braket{i|sv}, where :math:`idx = b_{BitOrderingLen-1}\cdots b_1 b_0`, :math:`i = b_{bitOrdering[BitOrderingLen-1]} \cdots b_{bitOrdering[1]} b_{bitOrdering[0]}`, :math:`b_p \in \{0, 1\}`. Therefore, probability to obtain the :math:`idx`-th pattern of bits are expressed in the following expression: .. math:: Pr(idx) = \dfrac{abs2sum[idx]}{\sum_{k}abs2sum[k]}. Depending on the measurement result, the state vector is collapsed. If :math:`idx` satisfies :math:`(idx \ \& \ bitString) = idx`, we obtain :math:`sv[idx] = \dfrac{1}{\sqrt{norm}} sv[idx]`. Otherwise, :math:`sv[idx] = 0`, where :math:`norm` is the normalization factor. Use case -------- We can measure by `custatevecBatchMeasure` as follows: .. code-block:: cpp // measure with a bit string custatevecBatchMeasure( handle, sv, svDataType, nIndexBits, bitString, bitOrdering, bitStringLen, randnum, collapse); The operation is equivalent to the following: .. code-block:: cpp // compute the sums of squared absolute values of state vector elements int maskLen = 0; int* maskBitString = nullptr; int* maskOrdering = nullptr; custatevecAbs2SumArray( handle, sv, svDataType, nIndexBits, abs2Sum, bitOrdering, bitOrderingLen, maskBitString, maskOrdering, maskLen); // [User] compute a cumulative sum and choose bitString by a random number // collapse if necessary switch (collapse) { case CUSTATEVEC_COLLAPSE_NORMALIZE_AND_ZERO: custatevecCollapseByBitString( handle, sv, svDataType, nIndexBits, bitString, bitOrdering, bitStringLen, norm); break; /* collapse */ case CUSTATEVEC_COLLAPSE_NONE: break; /* Do nothing */ For multi-GPU computations, `custatevecBatchMeasureWithOffset` is available. This function works on one device, and users are required to compute the cumulative array of squared absolute values of state vector elements beforehand. .. code-block:: cpp // The state vector is divided to nSubSvs sub state vectors. // each state vector has its own ordinal and nLocalBits index bits. // The ordinals of sub state vectors correspond to the extended index bits. // In this example, all the local qubits are measured and collapsed. // get abs2sum for each sub state vector double abs2SumArray[nSubSvs]; for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecAbs2SumArray( handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr, 0, nullptr, nullptr, 0); } for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); cudaDeviceSynchronize(); } // get cumulative array double cumulativeArray[nSubSvs + 1]; cumulativeArray[0] = 0.0; for (int iSv = 0; iSv < nSubSvs; iSv++) { cumulativeArray[iSv + 1] = cumulativeArray[iSv] + abs2SumArray[iSv]; } // measurement for (int iSv = 0; iSv < nSubSvs; iSv++) { // detect which sub state vector will be used for measurement. if (cumulativeArray[iSv] <= randnum && randnum < cumulativeArray[iSv + 1]) { double norm = cumulativeArray[nSubSvs]; double offset = cumulativeArray[iSv]; cudaSetDevice(devices[iSv]); // measure local qubits. Here the state vector will not be collapsed. // Only local qubits can be included in bitOrdering and bitString arguments. // That is, bitOrdering = {0, 1, 2, ..., nLocalBits - 1} and // bitString will store values of local qubits as an array of integers. custatevecBatchMeasureWithOffset( handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering, bitStringLen, randnum, CUSTATEVEC_COLLAPSE_NONE, offset, norm); } } for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); cudaDeviceSynchronize(); } // get abs2Sum after collapse for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecAbs2SumArray( handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &abs2SumArray[iSv], nullptr, 0, bitString, bitOrdering, bitStringLen); } for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); cudaDeviceSynchronize(); } // get norm after collapse double norm = 0.0; for (int iSv = 0; iSv < nSubSvs; iSv++) { norm += abs2SumArray[iSv]; } // collapse sub state vectors for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecCollapseByBitString( handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, bitString, bitOrdering, bitStringLen, norm); } // destroy handle for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecDestroy(handle[iSv]); } Please refer to `NVIDIA/cuQuantum `_ repository for further detail. API reference ------------- .. _custatevecAbs2SumArray-label: :code:`custatevecAbs2SumArray` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecAbs2SumArray ---- .. _custatevecCollapseByBitString-label: :code:`custatevecCollapseByBitString` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecCollapseByBitString ---- .. _custatevecBatchMeasure-label: :code:`custatevecBatchMeasure` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecBatchMeasure ---- .. _custatevecBatchMeasureWithOffset-label: :code:`custatevecBatchMeasureWithOffset` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecBatchMeasureWithOffset Expectation *********** Expectation via a Matrix ======================== Expectation performs the following operation: .. math:: \langle A \rangle = \bra{\phi}A\ket{\phi}, where :math:`\ket{\phi}` is a state vector and :math:`A` is a matrix or an observer. The API for expectation `custatevecComputeExpectation` may require external workspace for large matrices, and `custatevecComputeExpectationGetWorkspaceSize` provides the size of workspace. If a device memory handler is set, `custatevecComputeExpectationGetWorkspaceSize` can be skipped. Use case -------- .. code-block:: cpp // check the size of external workspace custatevecComputeExpectationGetWorkspaceSize( handle, svDataType, nIndexBits, matrix, matrixDataType, layout, nBasisBits, computeType, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary void* extraWorkspace = nullptr; if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // perform expectation custatevecComputeExpectation( handle, sv, svDataType, nIndexBits, expect, expectDataType, residualNorm, matrix, matrixDataType, layout, basisBits, nBasisBits, computeType, extraWorkspace, extraWorkspaceSizeInBytes); API reference ------------- .. _custatevecComputeExpectationGetWorkspaceSize-label: :code:`custatevecComputeExpectationGetWorkspaceSize` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecComputeExpectationGetWorkspaceSize ---- .. _custatevecComputeExpectation-label: :code:`custatevecComputeExpectation` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecComputeExpectation Expectation on Pauli Basis ========================== cuStateVec API `custatevecComputeExpectationsOnPauliBasis` computes expectation values for a batch of Pauli strings. Each observable can be expressed as follows: .. math:: P_{\text{basisBits}[0]} \otimes P_{\text{basisBits}[1]} \otimes \cdots \otimes P_{\text{basisBits}[\text{nBasisBits}-1]}. Each matrix :math:`P_{\text{basisBits}[i]}` can be one of the Pauli matrices :math:`I`, :math:`X`, :math:`Y`, and :math:`Z`, corresponding to the `custatevecPauli_t` enums `CUSTATEVEC_PAULI_I`, `CUSTATEVEC_PAULI_X`, `CUSTATEVEC_PAULI_Y`, and `CUSTATEVEC_PAULI_Z`, respectively. Also refer to :ref:`custatevecPauli_t` for details. Use case -------- .. code-block:: cpp // calculate the norm and the expectations for Z(q1) and X(q0)Y(q2) uint32_t nPauliOperatorArrays = 3; custatevecPauli_t pauliOperators0[] = {}; // III int32_t basisBits0[] = {}; custatevecPauli_t pauliOperators1[] = {CUSTATEVEC_PAULI_Z}; // IZI int32_t basisBits1[] = {1}; custatevecPauli_t pauliOperators2[] = {CUSTATEVEC_PAULI_X, CUSTATEVEC_PAULI_Y}; // XIY int32_t basisBits2[] = {0, 2}; const uint32_t nBasisBitsArray[] = {0, 1, 2}; const custatevecPauli_t* pauliOperatorsArray[] = {pauliOperators0, pauliOperators1, pauliOperators2}; const int32_t *basisBitsArray[] = { basisBits0, basisBits1, basisBits2}; uint32_t nIndexBits = 3; double expectationValues[nPauliOperatorArrays]; custatevecComputeExpectationsOnPauliBasis( handle, sv, svDataType, nIndexBits, expectationValues, pauliOperatorsArray, nPauliOperatorArrays, basisBitsArray, nBasisBitsArray); API reference ------------- .. _custatevecComputeExpectationsOnPauliBasis-label: :code:`custatevecComputeExpectationsOnPauliBasis` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doxygenfunction:: custatevecComputeExpectationsOnPauliBasis .. _samplerSection-label: Sampling ******** Sampling enables to obtain measurement results many times by using probability calculated from quantum states. Use case ======== .. code-block:: cpp // create sampler and check the size of external workspace custatevecSamplerCreate( handle, sv, svDataType, nIndexBits, &sampler, nMaxShots, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary void extraWorkspace = nullptr; if (extraWorkspaceSizeInBytes > 0) cudaMalloc(extraWorkspace, extraWorkspaceSizeInBytes); // calculate cumulative abs2sum custatevecSamplerPreprocess( handle, sampler, extraWorkspace, extraWorkspaceSizeInBytes); // [User] generate randnums, array of random numbers [0, 1) for sampling ... // sample bit strings custatevecSamplerSample( handle, sampler, bitStrings, bitOrdering, bitStringLen, randnums, nShots, output); // deallocate the sampler custatevecSamplerDestroy(sampler); For multi-GPU computations, cuStateVec provides `custatevecSamplerGetSquaredNorm` and `custatevecSamplerApplySubSVOffset`. Users are required to calculate cumulative abs2sum array with the squared norm of each sub state vector via `custatevecSamplerGetSquaredNorm` and provide its values to the sampler descriptor via `custatevecSamplerApplySubSVOffset`. .. code-block:: cpp // The state vector is divided to nSubSvs sub state vectors. // each state vector has its own ordinal and nLocalBits index bits. // The ordinals of sub state vectors correspond to the extended index bits. // create sampler and check the size of external workspace for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecSamplerCreate( handle[iSv], d_sv[iSv], CUDA_C_64F, nLocalBits, &sampler[iSv], nMaxShots, &extraWorkspaceSizeInBytes[iSv]); } // allocate external workspace if necessary for (int iSv = 0; iSv < nSubSvs; iSv++) { if (extraWorkspaceSizeInBytes[iSv] > 0) { cudaSetDevice(devices[iSv]); cudaMalloc(&extraWorkspace[iSv], extraWorkspaceSizeInBytes[iSv]); } } // sample preprocess for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]) ); custatevecSampler_preprocess( handle[iSv], sampler[iSv], extraWorkspace[iSv], extraWorkspaceSizeInBytes[iSv]); } // get norm of the sub state vectors double subNorms[nSubSvs]; for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecSamplerGetSquaredNorm( handle[iSv], sampler[iSv], &subNorms[iSv]); } for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); cudaDeviceSynchronize(); } // get cumulative array double cumulativeArray[nSubSvs + 1]; cumulativeArray[0] = 0.0; for (int iSv = 0; iSv < nSubSvs; iSv++) { cumulativeArray[iSv + 1] = cumulativeArray[iSv] + subNorms[iSv]; } double norm = cumulativeArray[nSubSvs]; // apply offset and norm for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]) ); custatevecSamplerApplySubSVOffset( handle[iSv], sampler[iSv], iSv, nSubSvs, cumulativeArray[iSv], norm); } // divide randnum array. randnums must be sorted in the ascending order. int shotOffsets[nSubSvs + 1]; shotOffsets[0] = 0; for (int iSv = 0; iSv < nSubSvs; iSv++) { double* pos = std::lower_bound(randnums, randnums + nShots, cumulativeArray[iSv + 1] / norm); if (iSv == nSubSvs - 1) { pos = randnums + nShots; } shotOffsets[iSv + 1] = pos - randnums; } // sample bit strings for (int iSv = 0; iSv < nSubSvs; iSv++) { int shotOffset = shotOffsets[iSv]; int nSubShots = shotOffsets[iSv + 1] - shotOffsets[iSv]; if (nSubShots > 0) { cudaSetDevice(devices[iSv]); custatevecSamplerSample( handle[iSv], sampler[iSv], &bitStrings[shotOffset], bitOrdering, bitStringLen, &randnums[shotOffset], nSubShots, CUSTATEVEC_SAMPLER_OUTPUT_RANDNUM_ORDER); } } for (int iSv = 0; iSv < nSubSvs; iSv++) { cudaSetDevice(devices[iSv]); custatevecSamplerDestroy(sampler[iSv]); } Please refer to `NVIDIA/cuQuantum `_ repository for further detail. API reference ============= .. _custatevecSamplerCreate-label: :code:`custatevecSamplerCreate` ------------------------------- .. doxygenfunction:: custatevecSamplerCreate ---- .. _custatevecSamplerPreprocess-label: :code:`custatevecSamplerPreprocess` ----------------------------------- .. doxygenfunction:: custatevecSamplerPreprocess ---- .. _custatevecSamplerGetSquaredNorm-label: :code:`custatevecSamplerGetSquaredNorm` --------------------------------------- .. doxygenfunction:: custatevecSamplerGetSquaredNorm ---- .. _custatevecSamplerApplySubSVOffset-label: :code:`custatevecSamplerApplySubSVOffset` ----------------------------------------- .. doxygenfunction:: custatevecSamplerApplySubSVOffset ---- .. _custatevecSamplerSample-label: :code:`custatevecSamplerSample` ------------------------------- .. doxygenfunction:: custatevecSamplerSample ---- .. _custatevecSamplerDestroy-label: :code:`custatevecSamplerDestroy` -------------------------------- .. doxygenfunction:: custatevecSamplerDestroy Accessor ******** An accessor extracts or updates state vector segments. The APIs `custatevecAccessorCreate` and `custatevecAccessorCreateView` initialize an accessor and also return the size of an extra workspace (if needed by the APIs `custatevecAccessorGet` and `custatevecAccessorSet` to perform the copy). The workspace must be bound to an accessor by `custatevecAccessorSetExtraWorkspace`, and the lifetime of the workspace must be as long as the accessor's to cover the entire duration of the copy operation. If a device memory handler is set, it is not necessary to provide explicit workspace by users. The ``begin`` and ``end`` arguments in the Get/Set APIs correspond to the state vector elements' indices such that elements within the specified range are copied. Use case ======== Extraction ---------- .. code-block:: cpp // create accessor and check the size of external workspace custatevecAccessorCreateView( handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen, maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // set external workspace custatevecAccessorSetExtraWorkspace( handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes); // get state vector elements custatevecAccessorGet( handle, &accessor, buffer, accessBegin, accessEnd); // deallocate the accessor custatevecAccessorDestroy(accessor); Update ------ .. code-block:: cpp // create accessor and check the size of external workspace custatevecAccessorCreate( handle, d_sv, CUDA_C_64F, nIndexBits, &accessor, bitOrdering, bitOrderingLen, maskBitString, maskOrdering, maskLen, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // set external workspace custatevecAccessorSetExtraWorkspace( handle, &accessor, extraWorkspace, extraWorkspaceSizeInBytes); // set state vector elements custatevecAccessorSet( handle, &accessor, buffer, 0, nSvSize); // deallocate the accessor custatevecAccessorDestroy(accessor); API reference ============= .. _custatevecAccessorCreate-label: :code:`custatevecAccessorCreate` -------------------------------- .. doxygenfunction:: custatevecAccessorCreate ---- .. _custatevecAccessorCreateView-label: :code:`custatevecAccessorCreateView` ------------------------------------ .. doxygenfunction:: custatevecAccessorCreateView ---- .. _custatevecAccessorDestroy-label: :code:`custatevecAccessorDestroy` --------------------------------- .. doxygenfunction:: custatevecAccessorDestroy ---- .. _custatevecAccessorSetExtraWorkspace-label: :code:`custatevecAccessorSetExtraWorkspace` ------------------------------------------- .. doxygenfunction:: custatevecAccessorSetExtraWorkspace ---- .. _custatevecAccessorGet-label: :code:`custatevecAccessorGet` ----------------------------- .. doxygenfunction:: custatevecAccessorGet ---- .. _custatevecAccessorSet-label: :code:`custatevecAccessorSet` ----------------------------- .. doxygenfunction:: custatevecAccessorSet .. _qubitReorderingSection-label: Qubit reordering **************** cuStateVec provides `custatevecSwapIndexBits` API for single device and `custatevecMultiDeviceSwapIndexBits` for multiple devices to reorder state vector elements. Use case ======== single device ------------- .. code-block:: cpp // This example uses 3 qubits. const int nIndexBits = 3; // swap 0th and 2nd qubits const int nBitSwaps = 1; const int2 bitSwaps[] = {{0, 2}}; // specify the qubit pairs // swap the state vector elements only if 1st qubit is 1 const int maskLen = 1; int maskBitString[] = {1}; // specify the values of mask qubits int maskOrdering[] = {1}; // specify the mask qubits // Swap index bit pairs. // {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to // {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}. custatevecSwapIndexBits(handle, sv, svDataType, nIndexBits, bitSwaps, nBitSwaps, maskBitString, maskOrdering, maskLen); multiple devices ---------------- .. code-block:: cpp // This example uses 2 GPUs and each GPU stores 2-qubit sub state vector. const int nGlobalIndexBits = 1; const int nLocalIndexBits = 2; const int nHandles = 1 << nGlobalIndexBits; // Users are required to enable direct access on a peer device prior to the swap API call. for (int i0 = 0; i0 < nHandles; i0++) { cudaSetDevice(i0); for (int i1 = 0; i1 < nHandles; i1++) { if (i0 == i1) continue; cudaDeviceEnablePeerAccess(i1, 0); } } cudaSetDevice(0); // specify the type of device network topology to optimize the data transfer sequence. // Here, devices are assumed to be connected via NVLink with an NVSwitch or // PCIe device network with a single PCIe switch. const custatevecDeviceNetworkType_t deviceNetworkType = CUSTATEVEC_DEVICE_NETWORK_TYPE_SWITCH; // swap 0th and 2nd qubits const int nIndexBitSwaps = 1; const int2 indexBitSwaps[] = {{0, 2}}; // specify the qubit pairs // swap the state vector elements only if 1st qubit is 1 const int maskLen = 1; int maskBitString[] = {1}; // specify the values of mask qubits int maskOrdering[] = {1}; // specify the mask qubits // Swap index bit pairs. // {|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>} will be permuted to // {|000>, |001>, |010>, |110>, |100>, |101>, |011>, |111>}. custatevecMultiDeviceSwapIndexBits(handles, nHandles, subSVs, svDataType, nGlobalIndexBits, nLocalIndexBits, indexBitSwaps, nIndexBitSwaps, maskBitString, maskOrdering, maskLen, deviceNetworkType); API reference ============= .. _custatevecSwapIndexBits-label: :code:`custatevecSwapIndexBits` ------------------------------- .. doxygenfunction:: custatevecSwapIndexBits ---- .. _custatevecMultiDeviceSwapIndexBits-label: :code:`custatevecMultiDeviceSwapIndexBits` ------------------------------------------ .. doxygenfunction:: custatevecMultiDeviceSwapIndexBits Matrix property testing *********************** The API `custatevecTestMatrixType` is available to check the properties of matrices. If a matrix :math:`A` is unitary, :math:`AA^{\dagger} = A^{\dagger}A = I`, where :math:`A^{\dagger}` is the conjugate transpose of :math:`A` and :math:`I` is the identity matrix, respectively. When `CUSTATEVEC_MATRIX_TYPE_UNITARY` is given for its argument, this API computes the 1-norm :math:`||R||_1 = \sum{|r_{ij}|}`, where :math:`R = AA^{\dagger} - I`. This value will be approximately zero if :math:`A` is unitary. If a matrix :math:`A` is Hermitian, :math:`A^{\dagger} = A`. When `CUSTATEVEC_MATRIX_TYPE_HERMITIAN` is given for its argument, this API computes the 2-norm :math:`||R||_2 = \sum{|r_{ij}|^2}`, where :math:`R = (A - A^{\dagger}) / 2`. This value will be approximately zero if :math:`A` is Hermitian. The API may require external workspace for large matrices, and `custatevecTestMatrixTypeGetWorkspaceSize` provides the size of workspace. If a device memory handler is set, it is not necessary to provide explicit workspace by users. Use case ======== .. code-block:: cpp double residualNorm; void* extraWorkspace = nullptr; size_t extraWorkspaceSizeInBytes = 0; // check the size of external workspace custatevecTestMatrixTypeGetWorkspaceSize( handle, matrixType, matrix, matrixDataType, layout, nTargets, adjoint, computeType, &extraWorkspaceSizeInBytes); // allocate external workspace if necessary if (extraWorkspaceSizeInBytes > 0) cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes); // execute testing custatevecTestMatrixType( handle, &residualNorm, matrixType, matrix, matrixDataType, layout, nTargets, adjoint, computeType, extraWorkspace, extraWorkspaceSizeInBytes); API reference ============= .. _custatevecTestMatrixTypeGetWorkspaceSize-label: :code:`custatevecTestMatrixTypeGetWorkspaceSize` ----------------------------------------------------- .. doxygenfunction:: custatevecTestMatrixTypeGetWorkspaceSize ---- .. _custatevecTestMatrixType-label: :code:`custatevecTestMatrixType` ----------------------------------------------------- .. doxygenfunction:: custatevecTestMatrixType