Release Notes¶
cuStateVec v1.0.0¶
Improve performance/functionality:
Gate application APIs are reoptimized:
custatevecApplyMatrix()reduced API execution latencies. Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively.custatevecApplyGeneralizedPermutationMatrix()reduced API execution latencies. Performance with small state vectors may improve for diagonal matrix cases.
Resolve issues:
Multi-threading issues in
custatevecApplyMatrix(),custatevecApplyGeneralizedPermutationMatrix(), andcustatevecComputeExpectationsOnPauliBasis()are fixed. All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle.
Add new API:
Binding a user-provided, stream-ordered memory pool to the library (see the introduction for Workspace and Memory Management API for detail).
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs (see
custatevecBatchMeasureWithOffset(),custatevecSamplerGetSquaredNorm(),custatevecSamplerApplySubSVOffset())Optimized state vector element swap algorithm on single GPU (see
custatevecSwapIndexBits())Testing whether a given matrix is Hermitian or unitary (see
custatevecTestMatrixType())Setting a logger callback with user-provided data (see
custatevecLoggerSetCallbackData())
API breaking changes:
The sampler and accessor descriptors are made completely opaque, just like the library handle
custatevecHandle_t. For both descriptors there is a corresponding destructor API. Also, they are now passed by value in various routines. Now the C and Python APIs are unified.Some APIs are renamed as follows:
previous version (< 1.0.0)
new version (= 1.0.0)
custatevecApplyMatrix_bufferSize
custatevecApplyExp
custatevecApplyGeneralizedPermutationMatrix_bufferSize
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()custatevecExpectation_bufferSize
custatevecExpectation
custatevecExpectationsOnPauliBasis
custatevecSampler_create
custatevecSampler_preprocess
custatevecSampler_sample
custatevecAccessor_create
custatevecAccessor_createReadOnly
custatevecAccessor_setExtraWorkspace
custatevecAccessor_set
custatevecAccessor_get
The arguments of the following APIs are reordered/renamed:
Compatibility notes:
cuStateVec requires CUDA 11.x
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERRORmight be returned if a wrong device pointer is passed to functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.1.1¶
Support for the NVIDIA cuQuantum Appliance (see here):
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
Optimized state vector element swap algorithm for multiple GPUs
Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance
cuStateVec v0.1.0¶
Add support for
Linux ppc64leAdd new APIs:
Gate application for generalized permutation matrices
Expectation values of Pauli strings
Accessor to get/set state vector elements
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.11 or above
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERRORmight be returned if a wrong device pointer is passed to functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.0.1¶
Initial release
Support
Linux x86_64,Linux Arm64Support Volta and Ampere architectures (compute capability 7.0+)
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.7 or above
Limitation notes:
This release is optimized for NVIDIA A100 and V100 GPUs.
CUSTATEVEC_STATUS_INTERNAL_ERRORmight be returned if a wrong device pointer is passed to functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.Performance optimization is planned in future releases.