Release Notes¶
cuStateVec v1.6.0¶
Added new API:
Expectation values for batched state vectors (see
custatevecComputeExpectationBatched()
)
Improved performance/functionality:
Reduced API execution latencies in
custatevecComputeExpectation()
. Performance may improve for 1-4 qubits observables.
Resolved issue:
Fixed an issue that
custatevecBatchMeasure()
could return incorrect results withnIndexBits > 20
.
Compatibility notes:
cuQuantum will drop support for RHEL 7 in the following cuQuantum release. Please plan ahead with this in mind. Thank you.
cuStateVec v1.5.0¶
Added new API:
Migration of sub state vectors (see Host state vector migration)
Improved performance/functionality:
Improved the performance of
custatevecApplyPauliRotation()
.
Resolved issues:
Fixed an issue that
custatevecMultiDeviceSwapIndexBits()
accepted invalid index bit positions specified to the indexBitSwaps argument.
cuStateVec v1.4.1¶
Resolve issues:
Fix an issue that
custatevecApplyMatrix()
is not asynchronously executed when applying 6 qubit gate matrices to a state vector of the double complex datatype.Fix an issue that
custatevecMeasureBatched()
can fail on NVIDIA H100 with the “illegal instruction” error.
cuStateVec v1.4.0¶
Add new APIs:
Gate application for batched state vectors (see
custatevecApplyMatrixBatched()
)Measurement for batched state vectors (see
custatevecAbs2SumArrayBatched()
,custatevecCollapseByBitStringBatched()
,custatevecMeasureBatched()
)State vector initialization to typical states (see
custatevecInitializeStateVector()
)
Resolve issues:
Fix for an issue on the Hopper Architecture wherein
custatevecApplyMatrix()
could produce incorrect results withnIndexBits + nControls = 12
,nTargets = 5
, andsvDataType = CUDA_C_32F
.
Compatibility notes:
cuStateVec supports Ubuntu 20.04+.
cuStateVec v1.3.0¶
Add new API:
Optimized state vector element swap algorithm on distributed state vectors (see the Distributed Index Bit Swap section.)
Improve performance/functionality:
Improved the performance of 5-qubit gate application with single precision and 6-qubit gate application with double precision in
custatevecApplyMatrix()
on the Hopper Architecture.CUDA Lazy Loading is supported. This can significantly reduce memory footprint by deferring the loading of needed GPU kernels to the first call sites. This feature requires CUDA 11.8 (or above). Please refer to the CUDA documentation for other requirements and details. Currently this feature requires users to opt in by setting the environment variable
CUDA_MODULE_LOADING=LAZY
. In a future CUDA version, lazy loading may become the default.
Resolve issues:
Fix for an issue of
custatevecMultiDeviceSwapIndexBits()
that CUDA calls are not correctly ordered on streams allocated on multiple GPUs.
Other changes:
Introduce support for CUDA 12.
A set of new wheels with suffix
-cu12
are released on PyPI.org for CUDA 12 users.Example:
pip install custatevec-cu12
for installing cuStateVec compatible with CUDA 12The existing
cuquantum
wheel (without the-cuXX
suffix) is turned into an automated installer that will attempt to detect the current CUDA environment and install the appropriate wheels. Please note that this automated detection may encounter conditions under which detection is unsuccessful, especially in a CPU-only environment (such as CI/CD). If detection fails we assume that the target environment is CUDA 11 and proceed. This assumption may be changed in a future release, and in such cases we recommend that users explicitly (manually) install the correct wheels.
Compatibility notes:
cuStateVec requires CUDA 11.x or 12.x.
cuStateVec supports Ubuntu 18.04+
In the next release, Ubuntu 18.04 will be dropped. The minimum supported Ubuntu version will be 20.04.
cuStateVec v1.2.0¶
We are on NVIDIA/cuQuantum GitHub Discussions! For any questions regarding (or exciting works built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.
Bug reports should still go to our GitHub issue tracker.
This release introduces support for the Hopper GPU family.
Improve performance/functionality:
Improve the performance of 4- and 5-qubit gate application in
custatevecApplyMatrix()
.Update
custatevecSamplerSample()
to accept more than 40-qubit state vectors.Add
CUSTATEVEC_STATUS_DEVICE_ALLOCATOR_ERROR
for errors related to the user-provided device memory handler.
Resolve issues:
Fix for an issue that
custatevecMultiDeviceSwapIndexBits()
can return wrong results for 32-qubit or larger state vectors.Fix register spilling in
custatevecApplyGeneralizedPermutationMatrix()
andcustatevecComputeExpectationsOnPauliBasis()
.
Other changes:
A conda package is released on conda-forge:
conda install -c conda-forge custatevec
. Users can still obtain both cuStateVec and cuTensorNet withconda install -c conda-forge cuquantum
, as before.A pip wheel is released on PyPI:
pip install custatevec-cu11
. Users can still obtain both cuStateVec and cuTensorNet withpip install cuquantum
, as before.Currently, the
cuquantum
meta-wheel points to thecuquantum-cu11
meta-wheel (which then points tocustatevec-cu11
andcutensornet-cu11
wheels). This may change in a future release when a new CUDA version becomes available. Using wheels with the-cuXX
suffix is encouraged.
cuStateVec v1.1.0¶
Add new API:
Optimized state vector element swap algorithm on multiple GPUs (see
custatevecMultiDeviceSwapIndexBits()
)
Improve performance/functionality:
Performance improvements of
custatevecApplyMatrix()
for 4- and 5-qubit gate application with complex 128Performance improvements of
custatevecApplyGeneralizedPermutationMatrix()
for 7-qubit or larger diagonal gate application
Resolve issues:
Fix for issues that
custatevecComputeExpectation()
andcustatevecApplyGeneralizedPermutationMatrix()
can return wrong results for 32-qubit or larger state vectors.Fix a glibc symbol issue that disallowed the release of the
cuquantum
ppc64le package on conda-forge.
Compatibility notes:
cuStateVec requires CUDA 11.x
Limitation notes:
custatevecMultiDeviceSwapIndexBits()
could cause segmentation fault in case a device doesn’t have peer-to-peer (P2P) access to another one. When segmentation faults occur during the API call, please check if direct access between any pair of devices is enabled bycudaDeviceEnablePeerAccess
.custatevecMultiDeviceSwapIndexBits()
could returnCUSTATEVEC_STATUS_INVALID_VALUE
if a handle created on the current device is not provided. Please refer tocustatevecMultiDeviceSwapIndexBits()
for the details.CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v1.0.0¶
Improve performance/functionality:
Gate application APIs are reoptimized:
custatevecApplyMatrix()
reduced API execution latencies. Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively.custatevecApplyGeneralizedPermutationMatrix()
reduced API execution latencies. Performance with small state vectors may improve for diagonal matrix cases.
Resolve issues:
Multi-threading issues in
custatevecApplyMatrix()
,custatevecApplyGeneralizedPermutationMatrix()
, andcustatevecComputeExpectationsOnPauliBasis()
are fixed. All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle.
Add new API:
Binding a user-provided, stream-ordered memory pool to the library (see the introduction for Workspace and Memory Management API for detail).
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs (see
custatevecBatchMeasureWithOffset()
,custatevecSamplerGetSquaredNorm()
,custatevecSamplerApplySubSVOffset()
)Optimized state vector element swap algorithm on single GPU (see
custatevecSwapIndexBits()
)Testing whether a given matrix is Hermitian or unitary (see
custatevecTestMatrixType()
)Setting a logger callback with user-provided data (see
custatevecLoggerSetCallbackData()
)
API breaking changes:
The sampler and accessor descriptors are made completely opaque, just like the library handle
custatevecHandle_t
. For both descriptors there is a corresponding destructor API. Also, they are now passed by value in various routines. Now the C and Python APIs are unified.Some APIs are renamed as follows:
previous version (< 1.0.0)
new version (= 1.0.0)
custatevecApplyMatrix_bufferSize
custatevecApplyExp
custatevecApplyGeneralizedPermutationMatrix_bufferSize
custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()
custatevecExpectation_bufferSize
custatevecExpectation
custatevecExpectationsOnPauliBasis
custatevecSampler_create
custatevecSampler_preprocess
custatevecSampler_sample
custatevecAccessor_create
custatevecAccessor_createReadOnly
custatevecAccessor_setExtraWorkspace
custatevecAccessor_set
custatevecAccessor_get
The arguments of the following APIs are reordered/renamed:
Compatibility notes:
cuStateVec requires CUDA 11.x
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.1.1¶
Support for the NVIDIA cuQuantum Appliance (see here):
Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
Optimized state vector element swap algorithm for multiple GPUs
Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance
cuStateVec v0.1.0¶
Add support for
Linux ppc64le
Add new APIs:
Gate application for generalized permutation matrices
Expectation values of Pauli strings
Accessor to get/set state vector elements
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.11 or above
Limitation notes:
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.
cuStateVec v0.0.1¶
Initial release
Support
Linux x86_64
,Linux Arm64
Support Volta and Ampere architectures (compute capability 7.0+)
Compatibility notes:
cuStateVec requires CUDA 11.4 or above
cuStateVec requires NVIDIA HPC SDK 21.7 or above
Limitation notes:
This release is optimized for NVIDIA A100 and V100 GPUs.
CUSTATEVEC_STATUS_INTERNAL_ERROR
might be returned if a wrong device pointer is passed to the functions. If a function returnsCUSTATEVEC_STATUS_INTERNAL_ERROR
, please check if a correct pointer is passed and the size is correctly specified.Performance optimization is planned in future releases.