*************
Release Notes
*************

=================
cuStateVec v1.2.0
=================

* We are on `NVIDIA/cuQuantum GitHub Discussions <https://github.com/NVIDIA/cuQuantum/discussions>`_! For any questions regarding (or exciting works built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.

  * Bug reports should still go to `our GitHub issue tracker <https://github.com/NVIDIA/cuQuantum/issues>`_.

* This release introduces support for the Hopper GPU family.

* Improve performance/functionality:

  * Improve the performance of 4- and 5-qubit gate application in `custatevecApplyMatrix`.
  * Update `custatevecSamplerSample` to accept more than 40-qubit state vectors.
  * Add `CUSTATEVEC_STATUS_DEVICE_ALLOCATOR_ERROR` for errors related to the user-provided device memory handler.

* Resolve issues:

  * Fix for an issue that `custatevecMultiDeviceSwapIndexBits` can return wrong results for 32-qubit or larger state vectors.
  * Fix register spilling in `custatevecApplyGeneralizedPermutationMatrix` and `custatevecComputeExpectationsOnPauliBasis`.

* Other changes:

  * A conda package is released on conda-forge: ``conda install -c conda-forge custatevec``. Users can still obtain both *cuStateVec* and *cuTensorNet* with ``conda install -c conda-forge cuquantum``, as before.

  * A pip wheel is released on PyPI: ``pip install custatevec-cu11``. Users can still obtain both *cuStateVec* and *cuTensorNet* with ``pip install cuquantum``, as before.

    * Currently, the ``cuquantum`` meta-wheel points to the ``cuquantum-cu11`` meta-wheel (which then points to ``custatevec-cu11`` and ``cutensornet-cu11`` wheels). This may change in a future release when a new CUDA version becomes available. Using wheels with the ``-cuXX`` suffix is encouraged.

=================
cuStateVec v1.1.0
=================

* Add new API:

  * Optimized state vector element swap algorithm on multiple GPUs (see `custatevecMultiDeviceSwapIndexBits`)

* Improve performance/functionality:

  * Performance improvements of `custatevecApplyMatrix` for 4- and 5-qubit gate application with complex 128
  * Performance improvements of `custatevecApplyGeneralizedPermutationMatrix` for 7-qubit or larger diagonal gate application

* Resolve issues:

  * Fix for issues that `custatevecComputeExpectation` and `custatevecApplyGeneralizedPermutationMatrix` can return wrong results for 32-qubit or larger state vectors.
  * Fix a glibc symbol issue that disallowed the release of the ``cuquantum`` ppc64le package on conda-forge.

*Compatibility notes*:

* *cuStateVec* requires CUDA 11.x

*Limitation notes*:

* `custatevecMultiDeviceSwapIndexBits` could cause segmentation fault in case a device doesn't have peer-to-peer (P2P) access to another one.
  When segmentation faults occur during the API call, please check if direct access between any pair of devices is enabled by `cudaDeviceEnablePeerAccess`.
* `custatevecMultiDeviceSwapIndexBits` could return `CUSTATEVEC_STATUS_INVALID_VALUE` if a handle created on the current device is not provided.
  Please refer to `custatevecMultiDeviceSwapIndexBits` for the details.
* ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions.
  If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified.

=================
cuStateVec v1.0.0
=================

* Improve performance/functionality:
  
  * Gate application APIs are reoptimized:

    - `custatevecApplyMatrix` reduced API execution latencies. 
      Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively.
    - `custatevecApplyGeneralizedPermutationMatrix` reduced API execution latencies. 
      Performance with small state vectors may improve for diagonal matrix cases.

* Resolve issues:

  * Multi-threading issues in `custatevecApplyMatrix`, `custatevecApplyGeneralizedPermutationMatrix`, and `custatevecComputeExpectationsOnPauliBasis` are fixed.
    All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle.

* Add new API:
  
  * Binding a user-provided, stream-ordered memory pool to the library (see the introduction for :ref:`workspace-label` and :ref:`cuStateVec memory management API` for detail).
  * Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
    (see `custatevecBatchMeasureWithOffset`, `custatevecSamplerGetSquaredNorm`, `custatevecSamplerApplySubSVOffset`)
  * Optimized state vector element swap algorithm on single GPU (see `custatevecSwapIndexBits`)
  * Testing whether a given matrix is Hermitian or unitary (see `custatevecTestMatrixType`)
  * Setting a logger callback with user-provided data (see `custatevecLoggerSetCallbackData`)

*API breaking changes*:

* The sampler and accessor descriptors are made completely opaque, just like the library handle `custatevecHandle_t`.
  For both descriptors there is a corresponding destructor API.
  Also, they are now passed by value in various routines. Now the C and Python APIs are unified.

* Some APIs are renamed as follows:

  ====================================================== ========================================================================
  previous version (< 1.0.0)                               new version (= 1.0.0)
  ====================================================== ========================================================================
  custatevecApplyMatrix_bufferSize                       `custatevecApplyMatrixGetWorkspaceSize`
  custatevecApplyExp                                     `custatevecApplyPauliRotation`
  custatevecApplyGeneralizedPermutationMatrix_bufferSize `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize`
  custatevecExpectation_bufferSize                       `custatevecComputeExpectationGetWorkspaceSize`
  custatevecExpectation                                  `custatevecComputeExpectation`
  custatevecExpectationsOnPauliBasis                     `custatevecComputeExpectationsOnPauliBasis`
  custatevecSampler_create                               `custatevecSamplerCreate`
  custatevecSampler_preprocess                           `custatevecSamplerPreprocess`
  custatevecSampler_sample                               `custatevecSamplerSample`
  custatevecAccessor_create                              `custatevecAccessorCreate`
  custatevecAccessor_createReadOnly                      `custatevecAccessorCreateView`
  custatevecAccessor_setExtraWorkspace                   `custatevecAccessorSetExtraWorkspace`
  custatevecAccessor_set                                 `custatevecAccessorSet`
  custatevecAccessor_get                                 `custatevecAccessorGet`
  ====================================================== ========================================================================

* The arguments of the following APIs are reordered/renamed:
  
  * `custatevecApplyMatrix`
  * `custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize`
  * `custatevecApplyGeneralizedPermutationMatrix`
  * `custatevecComputeExpectationsOnPauliBasis`

*Compatibility notes*:

* *cuStateVec* requires CUDA 11.x

*Limitation notes*:

* ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified.

=================
cuStateVec v0.1.1
=================

* Support for the NVIDIA cuQuantum Appliance (see :doc:`here <../appliance/index>`):

  * Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
  * Optimized state vector element swap algorithm for multiple GPUs
  * Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance

=================
cuStateVec v0.1.0
=================

* Add support for ``Linux ppc64le``
* Add new APIs:

  * Gate application for generalized permutation matrices
  * Expectation values of Pauli strings
  * Accessor to get/set state vector elements

*Compatibility notes*:

* *cuStateVec* requires CUDA 11.4 or above
* *cuStateVec* requires NVIDIA HPC SDK 21.11 or above

*Limitation notes*:

* ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified.

=================
cuStateVec v0.0.1
=================

* Initial release
* Support ``Linux x86_64``, ``Linux Arm64``
* Support Volta and Ampere architectures (compute capability 7.0+)

*Compatibility notes*:

* *cuStateVec* requires CUDA 11.4 or above
* *cuStateVec* requires NVIDIA HPC SDK 21.7 or above

*Limitation notes*:

* This release is optimized for NVIDIA A100 and V100 GPUs.
* ``CUSTATEVEC_STATUS_INTERNAL_ERROR`` might be returned if a wrong device pointer is passed to the functions. If a function returns ``CUSTATEVEC_STATUS_INTERNAL_ERROR``, please check if a correct pointer is passed and the size is correctly specified.
* Performance optimization is planned in future releases.