*************
Release Notes
*************

=========================
cuQuantum Python v23.10.0
=========================

* Add new APIs and functionalities:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.5.0 <custatevec/release_notes:cuStateVec v1.5.0>`
    and :ref:`cuTensorNet v2.3.0 <cutensornet/release_notes:cuTensorNet v2.3.0>`.

  * The function :func:`cuquantum.contract` now works like a native PyTorch operator as far as autograd is concerned, if the input operands are PyTorch tensors. This is an *experimental* feature.
  * A new, *experimental* method :meth:`cuquantum.Network.gradients` is added for computing the gradients of the network with respect to the input operands.

    * If the gradients are complex-valued, the convention follows `that of PyTorch's <https://pytorch.org/docs/stable/notes/autograd.html#autograd-for-complex-numbers>`_.

  * Added a new attribute :attr:`cuquantum.cutensornet.tensor.SVDMethod.discarded_weight_cutoff` to allow SVD truncation based on discarded weight.
  * The :class:`cuquantum.Network` constructor and its :meth:`~cuquantum.Network.reset_operands` method now accept an optional ``stream`` argument.

* Bugs fixed:

  * Fix potential data corruption when :meth:`~cuquantum.Network.reset_operands` is called when the provided operands don't outlive the contraction operation.
  * For the case of using CPU arrays (from NumPy/PyTorch) as input operands for contraction, the internal streams were not be properly ordered.
  * The methods :meth:`~cuquantum.Network.autotune`, :meth:`~cuquantum.Network.contract` and the standalone function :func:`~cuquantum.contract` allow passing the pointer address for the ``stream`` argument, as promised in the docs.
  * The attribute dtypes for :enum:`cuquantum.cutensornet.MarginalAttribute.OPT_NUM_HYPER_SAMPLES` and :enum:`cuquantum.cutensornet.SamplerAttribute.OPT_NUM_HYPER_SAMPLES` are fixed.

* Other changes:

  * If Python logging is enabled, cuTensorNet's run-time (instead of build-time) version is reported.
  * For passing PyTorch tensors to contraction APIs, the tensor flags ``.is_conj()`` and ``.requires_grad`` are now taken into account, unless a user explicitly overwrites them with the ``qualifiers`` argument.

=========================
cuQuantum Python v23.06.0
=========================

* Add new APIs and functionalities:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.4.0 <custatevec/release_notes:cuStateVec v1.4.0>`
    and :ref:`cuTensorNet v2.2.0 <cutensornet/release_notes:cuTensorNet v2.2.0>`.

    * Complex-valued gradients, as returned by the experimental API :func:`cuquantum.cutensornet.compute_gradients_backward`, differ by a complex conjugate from PyTorch's convention.

  * New attribute :attr:`cuquantum.cutensornet.tensor.SVDMethod.algorithm` that allows users to choose between various SVD algorithms including ``"gesvd"``, ``"gesvdj"``, ``"gesvdr"`` and ``"gesvdp"``.
    For ``"gesvdj"`` and ``"gesvdr"``, users may also provide customized settings (e.g, tolerance for ``"gesvdj"`` algorithm).
  * New attribute :attr:`cuquantum.cutensornet.tensor.SVDInfo.algorithm` that describes the SVD algorithm used in the SVD computation. 
    For ``"gesvdj"`` and ``"gesvdp"``, users may also access execution information (e.g, residual for ``"gesvdj"`` algorithm).

* Bugs fixed:

  * Fix a bug for the auto blocking option for :func:`cuquantum.cutensornet.tensor.decompose` and :func:`cuquantum.cutensornet.experimental.contract_decompose`. 
  * Fix a bug for :class:`cuquantum.CircuitToEinsum` to account for the potential global phase when parsing :class:`qiskit.QuantumCircuit`.
  * Fix a bug for :class:`cuquantum.CircuitToEinsum` to parse custom gates given :class:`qiskit.QuantumCircuit`.

* Other changes:

  * Improved the `Jupyter notebook for MPS demo <https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/tn_algorithms/mps_algorithms.ipynb>`_ by including the density-matrix based MPS-MPO contraction algorithm.
  * Avoid using any whitespace unicode characters as TN symbols in :class:`cuquantum.CircuitToEinsum`.
  * The path finding algorithm :class:`cuquantum.PathFinderOptions` now takes advantage of the new smart option to limit the pathfinder elapsed time (see `CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SMART_OPTION` for details). 
    This change also applies to public APIs including :func:`cuquantum.contract`, :func:`cuquantum.contract_path` and :class:`cuquantum.OptimizerOptions`.
  * When running the hyper-optimizer to compute the optimal contraction path with :func:`cuquantum.contract_path` & co,
    which could be long-running depending on the problem size, it is now possible to abort via Ctrl-C.
  * Conda packages compatible with CUDA 12 are available on conda-forge. Users can specify the target CUDA version using the new ``cuda-version`` metapackage if needed. For example, ``conda install -c conda-forge cuquantum-python cuda-version=12.0`` or ``conda install -c conda-forge cuquantum-python cuda-version=11.8``. This support has been backported to cuQuantum Python 23.03.
  * Improve pip dependency management: When installing ``cuquantum-python`` or ``cuquantum-python-cu12`` via pip, we will attempt to infer the compatible CuPy wheel and install it (with caveats noted below in the 23.03 release). The exception is ``cuquantum-python-cu11``, for which we require users to explicitly pick from ``cupy-cuda110``, ``cupy-cuda111``, or ``cupy-cuda11x`` and install it, following CuPy's installation guide.

    * If installing the meta-package ``cuquantum-python`` with pip 23.1+, passing ``--no-cache-dir`` to pip is required.

*Compatibility notes*:

* cuQuantum Python now requires Python 3.9+
* cuQuantum Python now requires NumPy v1.21+
* cuQuantum Python now requires CuPy v10+

*Known issues*:

* Under single precision, when the input tensor/matrix has a low rank, ``"gesvdr"`` based tensor SVD may suffer from reduced accuracy.
* When ``"gesvdp"`` algorithm is used for tensor SVD, user is responsible for checking :attr:`cuquantum.cutensornet.tensor.SVDInfo.gesvdp_err_sigma` to monitor the convergence.

=========================
cuQuantum Python v23.03.0
=========================

* Add new APIs and functionalities:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.3.0 <custatevec/release_notes:cuStateVec v1.3.0>`
    and :ref:`cuTensorNet v2.1.0 <cutensornet/release_notes:cuTensorNet v2.1.0>`.
  * A new module :mod:`cuquantum.cutensornet.tensor` that supports tensor decomposition routines via :func:`cuquantum.cutensornet.tensor.decompose` and :class:`cuquantum.cutensornet.tensor.DecompositionOptions`. 
    The new module is also directly accessible via the `cuquantum` namespace. The following decomposition methods are supported:

    - QR decomposition via :class:`cuquantum.cutensornet.tensor.QRMethod`.
    - Exact and approximate singular value decomposition (SVD) via :class:`cuquantum.cutensornet.tensor.SVDMethod`. For approximate SVD, run-time information on truncation is stored in and accessible via :class:`cuquantum.cutensornet.tensor.SVDInfo`.
  
  * A new module :mod:`cuquantum.cutensornet.experimental` with experimental APIs, including
    :func:`cuquantum.cutensornet.experimental.contract_decompose`, 
    :class:`cuquantum.cutensornet.experimental.ContractDecomposeAlgorithm` and
    :class:`cuquantum.cutensornet.experimental.ContractDecomposeInfo`. 
    These experimental APIs can be used to perform compound contraction and decomposition operations.
    Note that all new experimental APIs may be subject to change in a future release.
    Kindly share your feedback with us on `NVIDIA/cuQuantum GitHub Discussions`_!

  * A new attribute :attr:`cuquantum.CircuitToEinsum.gates` is added to allow users to access gate operands from
    :class:`cuquantum.CircuitToEinsum`.

* API changes:

  * The ``fixed`` kwarg support in :meth:`cuquantum.CircuitToEinsum.state_vector` is removed. The same functionality can be achieved via the same ``fixed`` kwarg in :meth:`cuquantum.CircuitToEinsum.batched_amplitudes`.

* Bugs fixed:

  * The output mode labels were not lexicographically ordered when the Einstein summation expression is provided in implicit form (this is a regression from cuQuantum Python v0.1.0.1).
  * Fix the parallel contraction failure when using MPICH (`NVIDIA/cuQuantum#31 <https://github.com/NVIDIA/cuQuantum/issues/31>`_).

* Other changes:

  * cuQuantum Python now supports Python 3.11.
  * cuQuantum Python now supports CUDA 12.
  * A set of new wheels with suffix ``-cu12`` are released on PyPI.org for CUDA 12 users.

    - Example: ``pip install cuquantum-python-cu12 cupy-cuda12x`` for setting up a wheel-based environment compatible with CUDA 12
    - The existing ``cuquantum`` and ``cuquantum-python`` wheels (without the ``-cuXX`` suffix) are turned into automated installers
      that will attempt to detect the current CUDA environment and install the appropriate wheels. Please note that this automated
      detection may encounter conditions under which detection is unsuccessful, especially in a CPU-only environment (such as CI/CD).
      If detection fails we assume that
      the target environment is CUDA 11 and proceed. This assumption may be changed in a future release, and in such cases we
      recommend that users explicitly (manually) install the correct wheels.

  * For conda packages, currently CUDA 12 support is pending the NVIDIA-led community effort (`conda-forge/staged-recipes#21382 <https://github.com/conda-forge/staged-recipes/issues/21382>`_). Once conda-forge supports CUDA 12 we will make compatible conda packages available.
  * `CUDA Lazy Loading`_ is supported. This can significantly reduce memory footprint by deferring the loading of needed GPU kernels to the first call sites. This feature requires CUDA 11.8 (or above) and cuTENSOR 1.7.0 (or above). Please refer to the CUDA documentation for other requirements and details. Currently this feature requires users to opt in by setting the environment variable ``CUDA_MODULE_LOADING=LAZY``. In a future CUDA version, lazy loading may become the default.

    - If you're a wheel user, update your environment with ``pip install "cutensor-cuXX>=1.7"`` (``XX`` = 11 or 12).
    - If you're a conda user, update your environment with ``conda install -c conda-forge "cudatoolkit>=11.8" "cutensor>=1.7"`` (for CUDA 11).

  * Our support policy is clarified, see :ref:`Compatibility policy <commitment>`.

*Compatibility notes*:

* cuQuantum Python requires Python 3.8+

  - In the next release, Python 3.8 will be dropped to follow `NEP-29`_. (This refers to the *pre-built* wheels on PyPI.org and the Conda packages on conda-forge. If you have any needs for pre-built support, please reach out to us on GitHub. Alternatively, you may build from source, although we might not guarantee indefinite support for source compatibility.)

* cuQuantum Python requires NumPy v1.19+

  - In the next release, NumPy 1.19 & 1.20 will be dropped to follow `NEP-29`_.

* cuQuantum Python requires CuPy v9.5+

  - In the next release, CuPy v9 will be dropped to be consistent with `NEP-29`_.

.. _CUDA Lazy Loading: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading

===========================
cuQuantum Python v22.11.0.1
===========================

This is a hot-fix release addressing a few issues in cuQuantum Python.

* Bugs fixed:

  * Fix performance degradation in :func:`cuquantum.contract` that could impact certain usage patterns.
  * Fix the ``.save_statevector()`` usage in the Jupyter notebook ``qiskit_basic.ipynb``.
  * Remove invalid code.

=========================
cuQuantum Python v22.11.0
=========================

* We are on `NVIDIA/cuQuantum GitHub Discussions`_! For any questions regarding (or to share any exciting work built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.

  * Bug reports should still go to `our GitHub issue tracker <https://github.com/NVIDIA/cuQuantum/issues>`_.

* Add new APIs and functionalities:

  * For cuTensorNet low-level APIs, please refer to the release notes of :ref:`cuTensorNet v2.0.0 <cutensornet/release_notes:cuTensorNet v2.0.0>`; no new cuStateVec API is added.
  * A new API, :meth:`cuquantum.CircuitToEinsum.batched_amplitudes` to compute the amplitudes for a batch of qubits. This is equivalent to the kwargs ``fixed`` support in :meth:`cuquantum.CircuitToEinsum.state_vector`, 
    which is deprecated and will be removed in a future release.
  * A new API, :meth:`cuquantum.CircuitToEinsum.expectation` to support expectation value computation for Pauli strings.
  * A new helper API, :func:`cuquantum.cutensornet.get_mpi_comm_pointer` to get the pointer to and size of MPI communicator for the new low-level API :func:`cuquantum.cutensornet.distributed_reset_configuration` that enables distributed parallelism. This capability requires ``mpi4py``.
  * The :mod:`cuquantum` module now has a command line interface to return the include and library paths and the linker flags for the cuTENSOR and cuQuantum libraries: ``python -m cuquantum``. See :ref:`python cmdline support` for detail.
  * Support for controlling non-blocking behavior via the new option :attr:`cuquantum.NetworkOptions.blocking`.

* API changes:

  * For cuTensorNet low-level APIs, please refer to the release notes of :ref:`cuTensorNet v2.0.0 <cutensornet/release_notes:cuTensorNet v2.0.0>`; cuStateVec low-level APIs remain unchanged.

    * Users can set the tensor qualifiers by using the dedicated NumPy dtype :data:`cuquantum.cutensornet.tensor_qualifiers_dtype`. See the Python sample (`python/samples/cutensornet/coarse/example21.py <https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/coarse/example21.py>`_) for details. For example, complex conjugation can be done on-the-fly, reducing memory pressure.
    * To get/set the contraction path or slicing configurations, users should use :func:`~cuquantum.cutensornet.contraction_optimizer_info_get_attribute_dtype` to get a NumPy custom dtype representing the path or slicing configuration object, in a manner consistent with the method used for all other attributes. Refer to the docstring for details. The (experimental) ``ContractionPath`` object is removed.

* Functionality/performance improvements:

  * Improved performance when contracting two tensors using :func:`cuquantum.contract` or related APIs.
  * Improved performance for reusing a :class:`~cuquantum.Network` object with :meth:`~cuquantum.Network.reset_operands`.
  * The lightcone construction in :class:`~cuquantum.CircuitToEinsum` is improved to further reduce the number of tensors in the network.
  * The build system now supports PEP-517 and standard ``pip`` command-line flags. The environment variable ``CUQUANTUM_IGNORE_SOLVER`` is no longer used. See :doc:`Build Requirements <running_examples>` for more information.

* Bugs fixed:

  * Fix a potential multi-device bug in the internal device context switch.
  * Fix a bug for using invalid mode labels in :class:`cuquantum.CircuitToEinsum` when input circuit size gets large. 

* Other changes:

  * Provide one more distributed (MPI+NCCL) Python sample (``example4_mpi_nccl.py``) to show how to use cuTensorNet and create parallelism.
  * The test infrastructure will show tests that are not runnable as "deselected" instead of "skipped".
  * A new pip wheel is released on PyPI: ``pip install cuquantum-python-cu11``. Users can still install cuQuantum Python via ``pip install cuquantum-python``, as before. ``cuquantum-python`` now becomes a meta-wheel pointing to ``cuquantum-python-cu11``. This may change in a future release when a new CUDA version becomes available. Using wheels with the ``-cuXX`` suffix is encouraged.

*Compatibility notes*:

* cuQuantum Python now requires cuStateVec 1.1.0 or above.
* cuQuantum Python now requires cuTensorNet 2.0.0 or above.
* cuQuantum Python now requires cuTENSOR 1.6.1 or above.

.. _NVIDIA/cuQuantum GitHub Discussions: https://github.com/NVIDIA/cuQuantum/discussions

=========================
cuQuantum Python v22.07.1
=========================

* Bugs fixed:

  * The 22.07.0 ``cuquantum`` wheel had a wrong file layout. (If you are using the ``cuquantum`` 22.07.0.1 or 22.07.0.2 hot-fix wheel, they will work fine.)

=========================
cuQuantum Python v22.07.0
=========================

* Add new APIs and functionalities:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.1.0 <custatevec/release_notes:cuStateVec v1.1.0>`
    and :ref:`cuTensorNet v1.1.0 <cutensornet/release_notes:cuTensorNet v1.1.0>`.
  * New high-level API :class:`cuquantum.CircuitToEinsum` that supports conversion of :class:`qiskit.QuantumCircuit` and :class:`cirq.Circuit` to tensor network contraction:

    - Support state coefficient
    - Support bitstring amplitude
    - Support reduced density matrix
    - Backend support on NumPy, CuPy and PyTorch

  * Add a keyword-only argument ``slices`` to the :meth:`cuquantum.Network.contract` method to support contracting an arbitrary subset of the slices.
  * Add a new attribute ``intermediate_modes`` to the :class:`cuquantum.OptimizerInfo` object for retrieving the mode labels of all intermediate tensors.
  * Add a new attribute ``num_slices`` to the :class:`cuquantum.OptimizerInfo` object for querying the total number of slices.

* Functionality/performance improvements:

  * Improve the einsum expression parser.

* Bugs fixed:

  * An exception mistakenly raised in :func:`cuquantum.einsum` when ``optimize`` is set to `False`.
  * Missing f-specifier in the string representation of :class:`cuquantum.OptimizerInfo`.

* Other changes:

  * Drop the dependency on ``typing_extensions``.
  * Provide distributed (MPI-based) Python samples that show how easy it is to use cuTensorNet and create parallelism. ``mpi4py`` is required for running these samples.
  * Update the low-level, non-distributed sample ``tensornet_example.py`` by improving memory usage and switching to the new contraction API :func:`~cuquantum.cutensornet.contract_slices`.
  * Provide Jupyter notebooks to show how to convert a quantum circuit to a tensor network contraction.
  * Add a Python sample to illustrate the usage of the new multi-device bit-swapping :func:`~cuquantum.custatevec.multi_device_swap_index_bits` API.
  * Restructure the ``samples`` folder to separate cuStateVec and cuTensorNet samples.

*Compatibility notes*:

* cuQuantum Python now requires cuQuantum v22.07.
* cuQuantum Python now requires Python 3.8+.
* cuQuantum Python now requires NumPy v1.19+.
* cuQuantum Python supports Cirq v0.6.0+.
* cuQuantum Python supports Qiskit v0.24.0+.

=========================
cuQuantum Python v22.05.0
=========================

* Bugs fixed:

  * Make ``typing_extensions`` a required dependency (`NVIDIA/cuQuantum#3 <https://github.com/NVIDIA/cuQuantum/issues/3>`_)
  * Fix issues in the test suite

* Other changes:

  * The Python sample (`python/samples/tensornet_example.py <https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/tensornet_example.py>`_) is updated to include a correctness check

=========================
cuQuantum Python v22.03.0
=========================

* Stable release:

  * Starting this release, cuQuantum Python switches to the CalVer versioning scheme, following cuQuantum SDK
  * ``pip`` wheels are released on PyPI: ``pip install cuquantum-python``

* Functionality/performance improvements:

  * High-level tensor network APIs are now fully NumPy compliant:

    - Support generalized einsum expressions
    - Support ellipsis
    - Support broadcasting

* Add new APIs and functionalities for:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.0.0 <custatevec/release_notes:cuStateVec v1.0.0>`
    and :ref:`cuTensorNet v1.0.0 <cutensornet/release_notes:cuTensorNet v1.0.0>`.
  * The high-level APIs support an EMM-like memory plugin interface (see :ref:`high-level memory management`).

* API changes:

  * For low-level APIs, please refer to the release notes of :ref:`cuStateVec v1.0.0 <custatevec/release_notes:cuStateVec v1.0.0>`
    and :ref:`cuTensorNet v1.0.0 <cutensornet/release_notes:cuTensorNet v1.0.0>`.
  * No API breaking changes for the high-level APIs.

*Compatibility notes*:

* cuQuantum Python requires cuQuantum v22.03
* cuQuantum Python requires Python 3.7+

  - In the next release, Python 3.7 will be dropped to follow `NEP-29`_.

* cuQuantum Python requires NumPy v1.17+

  - In the next release, NumPy 1.17 & 1.18 will be dropped to follow `NEP-29`_.

* cuQuantum Python requires CuPy v9.5+
* cuQuantum Python supports PyTorch v1.10+

.. _NEP-29: https://numpy.org/neps/nep-0029-deprecation_policy.html

*Known issues*:

* If you install cuQuantum Python from PyPI (``pip install cuquantum-python``), make sure you also install ``typing_extensions`` (via ``pip`` or ``conda``). This only affects the wheel installation and will be fixed in the next release. (`NVIDIA/cuQuantum#3 <https://github.com/NVIDIA/cuQuantum/issues/3>`_)

=========================
cuQuantum Python v0.1.0.1
=========================

* Patch release:

  - Add a ``__version__`` string

=========================
cuQuantum Python v0.1.0.0
=========================

* Initial release (beta 2)

*Compatibility notes*:

* cuQuantum Python requires cuQuantum v0.1.0
* cuQuantum Python requires NumPy v1.17+
* cuQuantum Python requires CuPy v9.5+
* cuQuantum Python supports PyTorch v1.10+

*Limitation notes*:

* In certain environments, if PyTorch is installed ``import cuquantum`` could fail (with a segmentation fault). It is currently under investigation and a temporary workaround is to import ``torch`` before importing ``cuquantum``.