################################################################################ Release Notes ################################################################################ ================================================================================ cuSPARSELt v0.0.1 ================================================================================ *New Features*: * Initial release * Support `Linux x86_64` and `SM 8.0` * Provide the following mixed-precision computation kernels: * `FP16` inputs/output, `FP32` Tensor Core accumulate * `BFLOAT16` inputs/output, `FP32` Tensor Core accumulate * `INT8` inputs/output, `INT32` Tensor Core compute *Compatibility notes*: * *cuSPARSELt* requires CUDA 11.0 or above ---- ================================================================================ cuSPARSELt v0.1.0 ================================================================================ *New Features*: * Added support for `Windows x86-64` and `Linux Arm64` platforms * Introduced `SM 8.6` compatibility * Added new kernels: - `FP32` inputs/output, `TF32` Tensor Core compute - `TF32` inputs/output, `TF32` Tensor Core compute * Better performance for `SM 8.0` kernels (up to 90% SOL) * New APIs for compression and pruning decoupled from `cusparseLtMatmulPlan_t` *Compatibility notes*: * *cuSPARSELt* requires CUDA 11.2 or above * `cusparseLtMatDescriptor_t` must be destroyed with `cusparseLtMatDescriptorDestroy` function * Both *static* and *shared* libraries must be linked with the `nvrtc` library * On Linux systems, both *static* and *shared* libraries must be linked with the `dl` library *Resolved issues*: * `CUSPARSELT_MATMUL_SEARCH_ITERATIONS` is now handled correctly ---- ================================================================================ cuSPARSELt v0.2.0 ================================================================================ *New Features*: * Added support for *activation functions* and *bias vector*: - ReLU + upper bound and threshold setting for all kernels - GeLU for `INT8` input/output, `INT32` Tensor Core compute kernels * Added support for *Batched Sparse GEMM*: - Single sparse matrix / Multiple dense matrices (*Broadcast*) - Multiple sparse and dense matrices - Batched bias vector *Compatibility notes*: * *cuSPARSELt* does not require the `nvrtc` library anymore * Support for *Ubuntu 16.04* (gcc-5) is now deprecated and it will be removed in future releases ---- ================================================================================ cuSPARSELt v0.3.0 ================================================================================ *New Features*: * Added support for *vectors of alpha and beta scalars* (per-channel scaling) * Added support for *GeLU scaling* * Added support for *Split-K Mode* * Full support for logging functionalities and NVTX ranges *API Changes*: * `cusparseLtMatmulGetWorkspace()` API to get workspace size needed by `cusparseLtMatmul()` *Resolved issues*: * Fixed documentation issue regarding structured matrix size constraints ---- ================================================================================ cuSPARSELt v0.4.0 ================================================================================ *New Features*: * Introduced `SM 8.9` compatibility * The initialization time of cuSPARSELt descriptors have been significantly improved * `cusparseLtMatmulSearch()` efficiency has been improved * Removed any internal memory allocations * Added a new kernel for supporting the following data type combination: `INT8` input, `INT32` Tensor Core compute, `FP16` output * Added `cusparseLtGetVersion()` and `cusparseLtGetProperty()` functions to retrieve the library version *API Changes*: * `cusparseLtSpMMACompressedSize()`, `cusparseLtSpMMACompress()`, `cusparseLtSpMMACompressedSize2()`, `cusparseLtSpMMACompress2()` have a new parameter to avoid internal memory allocations and support user-provided device memory buffer for the compression *Compatibility notes*: * *cuSPARSELt* requires CUDA Driver 470.xx (CUDA 11.4) or above * cuSPARSELt now uses the static version of the `cudart` library * The support for *Ubuntu 16.04* (gcc-5) has been removed