################################################################################
Release Notes
################################################################################

================================================================================
cuSPARSELt v0.0.1
================================================================================

*New Features*:

* Initial release
* Support `Linux x86_64` and `SM 8.0`
* Provide the following mixed-precision computation kernels:

    * `FP16` inputs/output, `FP32` Tensor Core accumulate
    * `BFLOAT16` inputs/output, `FP32` Tensor Core accumulate
    * `INT8` inputs/output, `INT32` Tensor Core compute

*Compatibility notes*:

* *cuSPARSELt* requires CUDA 11.0 or above

----

================================================================================
cuSPARSELt v0.1.0
================================================================================

*New Features*:

* Added support for `Windows x86-64` and `Linux Arm64` platforms
* Introduced `SM 8.6` compatibility
* Added new kernels:

    - `FP32` inputs/output, `TF32` Tensor Core compute
    - `TF32` inputs/output, `TF32` Tensor Core compute
* Better performance for `SM 8.0` kernels (up to 90% SOL)
* New APIs for compression and pruning decoupled from `cusparseLtMatmulPlan_t`

*Compatibility notes*:

* *cuSPARSELt* requires CUDA 11.2 or above
* `cusparseLtMatDescriptor_t` must be destroyed with `cusparseLtMatDescriptorDestroy` function
* Both *static* and *shared* libraries must be linked with the `nvrtc` library
* On Linux systems, both *static* and *shared* libraries must be linked with the `dl` library

*Resolved issues*:

* `CUSPARSELT_MATMUL_SEARCH_ITERATIONS` is now handled correctly

----

================================================================================
cuSPARSELt v0.2.0
================================================================================

*New Features*:

* Added support for *activation functions* and *bias vector*:

    - ReLU + upper bound and threshold setting for all kernels
    - GeLU for `INT8` input/output, `INT32` Tensor Core compute kernels

* Added support for *Batched Sparse GEMM*:

    - Single sparse matrix / Multiple dense matrices (*Broadcast*)
    - Multiple sparse and dense matrices
    - Batched bias vector

*Compatibility notes*:

* *cuSPARSELt* does not require the `nvrtc` library anymore
* Support for *Ubuntu 16.04* (gcc-5) is now deprecated and it will be
  removed in future releases

----

================================================================================
cuSPARSELt v0.3.0
================================================================================

*New Features*:

* Added support for *vectors of alpha and beta scalars* (per-channel scaling)

* Added support for *GeLU scaling*

* Added support for *Split-K Mode*

* Full support for logging functionalities and NVTX ranges

*API Changes*:

* `cusparseLtMatmulGetWorkspace()` API to get workspace size needed by `cusparseLtMatmul()`

*Resolved issues*:

* Fixed documentation issue regarding structured matrix size constraints

----

================================================================================
cuSPARSELt v0.4.0
================================================================================

*New Features*:

* Introduced `SM 8.9` compatibility
* The initialization time of cuSPARSELt descriptors have been significantly improved
* `cusparseLtMatmulSearch()` efficiency has been improved
* Removed any internal memory allocations
* Added a new kernel for supporting the following data type combination: `INT8` input, `INT32` Tensor Core compute, `FP16` output
* Added `cusparseLtGetVersion()` and `cusparseLtGetProperty()` functions to retrieve the library version

*API Changes*:

* `cusparseLtSpMMACompressedSize()`, `cusparseLtSpMMACompress()`, `cusparseLtSpMMACompressedSize2()`, `cusparseLtSpMMACompress2()` have a new parameter to avoid internal memory allocations and support user-provided device memory buffer for the compression

*Compatibility notes*:

* *cuSPARSELt* requires CUDA Driver 470.xx (CUDA 11.4) or above
* cuSPARSELt now uses the static version of the `cudart` library
* The support for *Ubuntu 16.04* (gcc-5) has been removed