Release Notes¶
cuStateVec v1.2.0¶
We are on NVIDIA/cuQuantum GitHub Discussions! For any questions regarding (or exciting works built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.
Bug reports should still go to our GitHub issue tracker.
This release introduces support for the Hopper GPU family.
Improve performance/functionality:
Improve the performance of 4- and 5-qubit gate application in
custatevecApplyMatrix()
.Update
custatevecSamplerSample()
to accept more than 40-qubit state vectors.Add
CUSTATEVEC_STATUS_DEVICE_ALLOCATOR_ERROR
for errors related to the user-provided device memory handler.
Resolve issues:
Fix for an issue that
custatevecMultiDeviceSwapIndexBits()
can return wrong results for 32-qubit or larger state vectors.Fix register spilling in
custatevecApplyGeneralizedPermutationMatrix()
andcustatevecComputeExpectationsOnPauliBasis()
.
Other changes:
A conda package is released on conda-forge:
conda install -c conda-forge custatevec
. Users can still obtain both cuStateVec and cuTensorNet withconda install -c conda-forge cuquantum
, as before.A pip wheel is released on PyPI:
pip install custatevec-cu11
. Users can still obtain both cuStateVec and cuTensorNet withpip install cuquantum
, as before.Currently, the
cuquantum
meta-wheel points to thecuquantum-cu11
meta-wheel (which then points tocustatevec-cu11
andcutensornet-cu11
wheels). This may change in a future release when a new CUDA version becomes available. Using wheels with the-cuXX
suffix is encouraged.
cuStateVec v1.1.0¶
Add new API:
Optimized state vector element swap algorithm on multiple GPUs (see
custatevecMultiDeviceSwapIndexBits()
)
Improve performance/functionality:
Performance improvements of
custatevecApplyMatrix()
for 4- and 5-qubit gate application with complex 128Performance improvements of
custatevecApplyGeneralizedPermutationMatrix()
for 7-qubit or larger diagonal gate application
Resolve issues:
Fix for issues that
custatevecComputeExpectation()
andcustatevecApplyGeneralizedPermutationMatrix()
can return wrong results for 32-qubit or larger state vectors.Fix a glibc symbol issue that disallowed the release of the
cuquantum
ppc64le package on conda-forge.
Compatibility notes:
cuStateVec requires CUDA 11.x
Limitation notes:
custatevecMultiDeviceSwapIndexBits()
could cause segmentation fault in case a device doesn’t have peer-to-peer (P2P) access to another one. When segmentation faults occur during the API call, please check if direct access between any pair of devices is enabled bycudaDeviceEnablePeerAccess
.custatevecMultiDeviceSwapIndexBits()
could returnCUSTATEVEC_STATUS_INVALID_VALUE
if a handle created on the current device is not provided. Please refer tocustatevecMultiDeviceSwapIndexBits()
for the details.CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v1.0.0¶
Improve performance/functionality:
Gate application APIs are reoptimized:
custatevecApplyMatrix()
reduced API execution latencies. Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively.custatevecApplyGeneralizedPermutationMatrix()
reduced API execution latencies. Performance with small state vectors may improve for diagonal matrix cases.
Resolve issues:
Multi-threading issues in
custatevecApplyMatrix()
,custatevecApplyGeneralizedPermutationMatrix()
, andcustatevecComputeExpectationsOnPauliBasis()
are fixed. All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle.
Add new API:
Binding a user-provided, stream-ordered memory pool to the library (see the introduction for Workspace and Memory Management API for detail).
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs (see
custatevecBatchMeasureWithOffset()
,custatevecSamplerGetSquaredNorm()
,custatevecSamplerApplySubSVOffset()
)Optimized state vector element swap algorithm on single GPU (see
custatevecSwapIndexBits()
)Testing whether a given matrix is Hermitian or unitary (see
custatevecTestMatrixType()
)Setting a logger callback with user-provided data (see
custatevecLoggerSetCallbackData()
)
API breaking changes:
The sampler and accessor descriptors are made completely opaque, just like the library handle
custatevecHandle_t
. For both descriptors there is a corresponding destructor API. Also, they are now passed by value in various routines. Now the C and Python APIs are unified.Some APIs are renamed as follows:
previous version (< 1.0.0)
new version (= 1.0.0)
custatevecApplyMatrix_bufferSize
custatevecApplyExp
custatevecApplyGeneralizedPermutationMatrix_bufferSize
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()
custatevecExpectation_bufferSize
custatevecExpectation
custatevecExpectationsOnPauliBasis
custatevecSampler_create
custatevecSampler_preprocess
custatevecSampler_sample
custatevecAccessor_create
custatevecAccessor_createReadOnly
custatevecAccessor_setExtraWorkspace
custatevecAccessor_set
custatevecAccessor_get
The arguments of the following APIs are reordered/renamed:
Compatibility notes:
cuStateVec requires CUDA 11.x
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.1.1¶
Support for the NVIDIA cuQuantum Appliance (see here):
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
Optimized state vector element swap algorithm for multiple GPUs
Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance
cuStateVec v0.1.0¶
Add support for
Linux ppc64le
Add new APIs:
Gate application for generalized permutation matrices
Expectation values of Pauli strings
Accessor to get/set state vector elements
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.11 or above
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.0.1¶
Initial release
Support
Linux x86_64
,Linux Arm64
Support Volta and Ampere architectures (compute capability 7.0+)
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.7 or above
Limitation notes:
This release is optimized for NVIDIA A100 and V100 GPUs.
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.Performance optimization is planned in future releases.