************* Release Notes ************* ================= cuStateVec v1.2.0 ================= * We are on `NVIDIA/cuQuantum GitHub Discussions `_! For any questions regarding (or exciting works built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions. * Bug reports should still go to `our GitHub issue tracker `_. * This release introduces support for the Hopper GPU family. * Improve performance/functionality: * Improve the performance of 4- and 5-qubit gate application in `custatevecApplyMatrix`. * Update `custatevecSamplerSample` to accept more than 40-qubit state vectors. * Add `CUSTATEVEC_STATUS_DEVICE_ALLOCATOR_ERROR` for errors related to the user-provided device memory handler. * Resolve issues: * Fix for an issue that `custatevecMultiDeviceSwapIndexBits` can return wrong results for 32-qubit or larger state vectors. * Fix register spilling in `custatevecApplyGeneralizedPermutationMatrix` and `custatevecComputeExpectationsOnPauliBasis`. * Other changes: * A conda package is released on conda-forge: ``conda install -c conda-forge custatevec``. Users can still obtain both *cuStateVec* and *cuTensorNet* with ``conda install -c conda-forge cuquantum``, as before. * A pip wheel is released on PyPI: ``pip install custatevec-cu11``. Users can still obtain both *cuStateVec* and *cuTensorNet* with ``pip install cuquantum``, as before. * Currently, the ``cuquantum`` meta-wheel points to the ``cuquantum-cu11`` meta-wheel (which then points to ``custatevec-cu11`` and ``cutensornet-cu11`` wheels). This may change in a future release when a new CUDA version becomes available. Using wheels with the ``-cuXX`` suffix is encouraged. ================= cuStateVec v1.1.0 ================= * Add new API: * Optimized state vector element swap algorithm on multiple GPUs (see `custatevecMultiDeviceSwapIndexBits`) * Improve performance/functionality: * Performance improvements of `custatevecApplyMatrix` for 4- and 5-qubit gate application with complex 128 * Performance improvements of `custatevecApplyGeneralizedPermutationMatrix` for 7-qubit or larger diagonal gate application * Resolve issues: * Fix for issues that `custatevecComputeExpectation` and `custatevecApplyGeneralizedPermutationMatrix` can return wrong results for 32-qubit or larger state vectors. * Fix a glibc symbol issue that disallowed the release of the ``cuquantum`` ppc64le package on conda-forge. *Compatibility notes*: * *cuStateVec* requires CUDA 11.x *Limitation notes*: * `custatevecMultiDeviceSwapIndexBits` could cause segmentation fault in case a device doesn't have peer-to-peer (P2P) access to another one. When segmentation faults occur during the API call, please check if direct access between any pair of devices is enabled by `cudaDeviceEnablePeerAccess`. * `custatevecMultiDeviceSwapIndexBits` could return `CUSTATEVEC_STATUS_INVALID_VALUE` if a handle created on the current device is not provided. Please refer to `custatevecMultiDeviceSwapIndexBits` for the details. * ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified. ================= cuStateVec v1.0.0 ================= * Improve performance/functionality: * Gate application APIs are reoptimized: - `custatevecApplyMatrix` reduced API execution latencies. Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively. - `custatevecApplyGeneralizedPermutationMatrix` reduced API execution latencies. Performance with small state vectors may improve for diagonal matrix cases. * Resolve issues: * Multi-threading issues in `custatevecApplyMatrix`, `custatevecApplyGeneralizedPermutationMatrix`, and `custatevecComputeExpectationsOnPauliBasis` are fixed. All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle. * Add new API: * Binding a user-provided, stream-ordered memory pool to the library (see the introduction for :ref:`workspace-label` and :ref:`cuStateVec memory management API` for detail). * Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs (see `custatevecBatchMeasureWithOffset`, `custatevecSamplerGetSquaredNorm`, `custatevecSamplerApplySubSVOffset`) * Optimized state vector element swap algorithm on single GPU (see `custatevecSwapIndexBits`) * Testing whether a given matrix is Hermitian or unitary (see `custatevecTestMatrixType`) * Setting a logger callback with user-provided data (see `custatevecLoggerSetCallbackData`) *API breaking changes*: * The sampler and accessor descriptors are made completely opaque, just like the library handle `custatevecHandle_t`. For both descriptors there is a corresponding destructor API. Also, they are now passed by value in various routines. Now the C and Python APIs are unified. * Some APIs are renamed as follows: ====================================================== ======================================================================== previous version (< 1.0.0) new version (= 1.0.0) ====================================================== ======================================================================== custatevecApplyMatrix_bufferSize `custatevecApplyMatrixGetWorkspaceSize` custatevecApplyExp `custatevecApplyPauliRotation` custatevecApplyGeneralizedPermutationMatrix_bufferSize `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize` custatevecExpectation_bufferSize `custatevecComputeExpectationGetWorkspaceSize` custatevecExpectation `custatevecComputeExpectation` custatevecExpectationsOnPauliBasis `custatevecComputeExpectationsOnPauliBasis` custatevecSampler_create `custatevecSamplerCreate` custatevecSampler_preprocess `custatevecSamplerPreprocess` custatevecSampler_sample `custatevecSamplerSample` custatevecAccessor_create `custatevecAccessorCreate` custatevecAccessor_createReadOnly `custatevecAccessorCreateView` custatevecAccessor_setExtraWorkspace `custatevecAccessorSetExtraWorkspace` custatevecAccessor_set `custatevecAccessorSet` custatevecAccessor_get `custatevecAccessorGet` ====================================================== ======================================================================== * The arguments of the following APIs are reordered/renamed: * `custatevecApplyMatrix` * `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize` * `custatevecApplyGeneralizedPermutationMatrix` * `custatevecComputeExpectationsOnPauliBasis` *Compatibility notes*: * *cuStateVec* requires CUDA 11.x *Limitation notes*: * ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified. ================= cuStateVec v0.1.1 ================= * Support for the NVIDIA cuQuantum Appliance (see :doc:`here <../appliance/index>`): * Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs * Optimized state vector element swap algorithm for multiple GPUs * Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance ================= cuStateVec v0.1.0 ================= * Add support for ``Linux ppc64le`` * Add new APIs: * Gate application for generalized permutation matrices * Expectation values of Pauli strings * Accessor to get/set state vector elements *Compatibility notes*: * *cuStateVec* requires CUDA 11.4 or above * *cuStateVec* requires NVIDIA HPC SDK 21.11 or above *Limitation notes*: * ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified. ================= cuStateVec v0.0.1 ================= * Initial release * Support ``Linux x86_64``, ``Linux Arm64`` * Support Volta and Ampere architectures (compute capability 7.0+) *Compatibility notes*: * *cuStateVec* requires CUDA 11.4 or above * *cuStateVec* requires NVIDIA HPC SDK 21.7 or above *Limitation notes*: * This release is optimized for NVIDIA A100 and V100 GPUs. * ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified. * Performance optimization is planned in future releases.