*************
Release Notes
*************

===================
cuDensityMat v0.3.0
===================

*New features*:

* Support extreme eigen-spectrum computation for arbitrary non-batched Hermitian operators.
* Support integration with JAX XLA jitting and vector-jacobian product (VJP).

*Bugs fixed*:

* Fixed a functional bug for certain mixed quantum state calculations in multi-node multi-GPU execution
  with more than 4 processes where `cudensitymatOperatorPrepareAction` may have run indefinitely.
* Fixed a functional bug in the multi-GPU multi-node expectation value computation
  that may have run into CUDENSITYMAT_INTERNAL_ERROR for some workspace buffer sizes.
* Fixed the internal check for the number of parallel processes to be a power of two.

*Compatibility notes*:

  * *cuDensityMat* requires cuTENSOR-2.3.1 or above.
  * *cuDensityMat* requires cuTensorNet-2.9.0 or above.

===================
cuDensityMat v0.2.0
===================

*New features*:

* Support :ref:`vector-jacobian product (VJP) computation<cuDM-bwd-diff>` (backward differentiation)
  for non-batched single-GPU execution with dense operators only (dense elementary tensor operators
  and dense matrix operators).
* Support new backward-differentiation gradient callbacks for VJP computation.
* Support single-mode multidiagonal elementary tensor operators with an arbitrary mode extent
  and up to 256 non-zero diagonals.
* Support Volta, Turing, Ampere, Ada, Hopper, and Blackwell NVIDIA GPU architectures (compute capability 7.0+)

*Compatibility notes*:

* *cuDensityMat* requires cuTENSOR 2.2.0 or above
* *cuDensityMat* now requires cuTensorNet 2.8.0 or above
* Removed `cudensitymatOperatorTermAppendGeneralProduct` C API function

*Known issues*:

* The multi-GPU multi-node expectation value computation may run into CUDENSITYMAT_INTERNAL_ERROR
  for some workspace buffer sizes. The workaround is to set the workspace buffer size to exactly match
  the value returned by the PrepareExpection call, or use the maximum of the workspace buffer sizes
  across all MPI ranks for each rank.

*Bugs fixed*:

* Fixed multidiagonal elementary tensor operators of mode dimension 2 causing illegal memory access.

===================
cuDensityMat v0.1.0
===================

* Support full matrix operators (operator matrices defined in the full composite space)
* Support batched operators (both elementary tensor operators and full matrix operators)
* Support both CPU-side and GPU-side tensor/scalar callbacks
* Support Volta, Turing, Ampere, Ada, Hopper, and Blackwell NVIDIA GPU architectures (compute capability 7.0+)

*Compatibility notes*:

* *cuDensityMat* requires CUDA 12 or above
* *cuDensityMat* requires cuTENSOR 2.2.0 or above
* *cuDensityMat* supports NVIDIA HPC SDK 21.7 or above
* Tensor and scalar callback function signatures have changed to support operator batching

*Known issues*:

* Activating *cuDensityMat* logging by setting environment variable ``CUDENSITYMAT_LOG_LEVEL``
  while using GPU-side tensor/scalar callbacks will result in a crash
* Multidiagonal elementary operators of mode dimension 2 can cause an illegal memory access.
  As a workaround, please use dense elementary operators for any elementary operator with mode dimension 2 instead.
  Multidiagonal elementary operators for mode dimensions larger than 2 are not affected by this issue.

===================
cuDensityMat v0.0.5
===================

* Initial release
* Single-GPU and multi-GPU/multi-node capabilities (requires MPI)
* Support ``Linux x86_64`` and ``Linux Arm64`` targets
* Support Volta, Turing, Ampere, Ada and Hopper NVIDIA GPU architectures (compute capability 7.0+)

*Compatibility notes*:

* *cuDensityMat* requires CUDA 11.4 or above
* *cuDensityMat* requires cuTENSOR 2.0.2 or above
* *cuDensityMat* supports NVIDIA HPC SDK 21.7 or above