################################################################################ Release Notes ################################################################################ ================================================================================ cuSPARSELt v0.6.0 ================================================================================ *New Features*: * Introduced Hopper support (`SM 9.0`). * Added new kernels for the following data type combinations for the `SM 9.0` architecture: * `FP16` input/output, `FP16` Tensor Core compute * `E4M3` input/output, `FP32` Tensor Core compute; the data type of matrix C can be either `FP16` or `BF16` * `E4M3` input, `FP16` output, `FP32` Tensor Core compute * `E4M3` input, `BF16` output, `FP32` Tensor Core compute * `E4M3` input, `FP32` output, `FP32` Tensor Core compute * `E5M2` input/output, `FP32` Tensor Core compute; the data type of matrix C can be either `FP16` or `BF16` * `E5M2` input, `FP16` output, `FP32` Tensor Core compute * `E5M2` input, `BF16` output, `FP32` Tensor Core compute * `E5M2` input, `FP32` output, `FP32` Tensor Core compute *Driver Requirements*: * *cuSPARSELt* requires CUDA Driver `r535 TRD7`, `r550 TRD1` or higher. *API Changes*: * The following APIs are deprecated: `cusparseLtSpMMAPrune2()`, `cusparseLtSpMMAPruneCheck2()`, `cusparseLtSpMMACompressedSize2()`, `cusparseLtSpMMACompress2()`. *Dependencies*: * *cuSPARSELt* now requires link to the CUDA driver library (`libcuda.so` on Linux and `cuda.lib` on Windows). *Known Issues* * `cusparseLtSpMMAompressedSize()` and `cusparseLtSpMMAompressedSize2()` allocate slightly more memory than needed. This issue will be addressed in the next release. ---- ================================================================================ cuSPARSELt v0.5.2 ================================================================================ *New Features*: * Added a new kernel for the following data type combination: `INT8` inputs, `BF16` output, `INT32` Tensor Core accumulate * Symbols are obfuscated in the static library. *Compatibility notes*: * Added support for *RHEL 7* and *CentOs 7*. * Split-k is enabled in `cusparseLtMatmulSearch()` across a broader range of problem dimensions. * `CUSPARSE_COMPUTE_16F`, `CUSPARSE_COMPUTE_TF32`, `CUSPARSE_COMPUTE_TF32_FAST` enumerators have been removed for the `cusparseComputeType` enumerator and replaced with `CUSPARSE_COMPUTE_32F` to better express the accuracy of the computation at tensor core level. ---- ================================================================================ cuSPARSELt v0.5.0 ================================================================================ *New Features*: * Added a new kernel for the following data type combination: `INT8` inputs, `INT32` output, `INT32` Tensor Core accumulate *Compatibility notes*: * *cuSPARSELt* requires CUDA 12.0 or above, and compatible driver (see `CUDA Driver Release Notes `_). * `cusparseLtMatmulAlgSelectionInit()` does not ensure the same ordering of algorithm id `alg` as in v0.4.0. ---- ================================================================================ cuSPARSELt v0.4.0 ================================================================================ *New Features*: * Introduced `SM 8.9` compatibility * The initialization time of cuSPARSELt descriptors have been significantly improved * `cusparseLtMatmulSearch()` efficiency has been improved * Removed any internal memory allocations * Added a new kernel for supporting the following data type combination: `INT8` input, `INT32` Tensor Core compute, `FP16` output * Added `cusparseLtGetVersion()` and `cusparseLtGetProperty()` functions to retrieve the library version *API Changes*: * `cusparseLtSpMMACompressedSize()`, `cusparseLtSpMMACompress()`, `cusparseLtSpMMACompressedSize2()`, `cusparseLtSpMMACompress2()` have a new parameter to avoid internal memory allocations and support user-provided device memory buffer for the compression *Compatibility notes*: * *cuSPARSELt* requires CUDA Driver 470.xx (CUDA 11.4) or above * cuSPARSELt now uses the static version of the `cudart` library * The support for *Ubuntu 16.04* (gcc-5) has been removed ---- ================================================================================ cuSPARSELt v0.3.0 ================================================================================ *New Features*: * Added support for *vectors of alpha and beta scalars* (per-channel scaling) * Added support for *GeLU scaling* * Added support for *Split-K Mode* * Full support for logging functionalities and NVTX ranges *API Changes*: * `cusparseLtMatmulGetWorkspace()` API to get workspace size needed by `cusparseLtMatmul()` *Resolved issues*: * Fixed documentation issue regarding structured matrix size constraints ---- ================================================================================ cuSPARSELt v0.2.0 ================================================================================ *New Features*: * Added support for *activation functions* and *bias vector*: - ReLU + upper bound and threshold setting for all kernels - GeLU for `INT8` input/output, `INT32` Tensor Core compute kernels * Added support for *Batched Sparse GEMM*: - Single sparse matrix / Multiple dense matrices (*Broadcast*) - Multiple sparse and dense matrices - Batched bias vector *Compatibility notes*: * *cuSPARSELt* does not require the `nvrtc` library anymore * Support for *Ubuntu 16.04* (gcc-5) is now deprecated and it will be removed in future releases ---- ================================================================================ cuSPARSELt v0.1.0 ================================================================================ *New Features*: * Added support for `Windows x86-64` and `Linux Arm64` platforms * Introduced `SM 8.6` compatibility * Added new kernels: - `FP32` inputs/output, `TF32` Tensor Core compute - `TF32` inputs/output, `TF32` Tensor Core compute * Better performance for `SM 8.0` kernels (up to 90% SOL) * New APIs for compression and pruning decoupled from `cusparseLtMatmulPlan_t` *Compatibility notes*: * *cuSPARSELt* requires CUDA 11.2 or above * `cusparseLtMatDescriptor_t` must be destroyed with `cusparseLtMatDescriptorDestroy` function * Both *static* and *shared* libraries must be linked with the `nvrtc` library * On Linux systems, both *static* and *shared* libraries must be linked with the `dl` library *Resolved issues*: * `CUSPARSELT_MATMUL_SEARCH_ITERATIONS` is now handled correctly ---- ================================================================================ cuSPARSELt v0.0.1 ================================================================================ *New Features*: * Initial release * Support `Linux x86_64` and `SM 8.0` * Provide the following mixed-precision computation kernels: * `FP16` inputs/output, `FP32` Tensor Core accumulate * `BF16` inputs/output, `FP32` Tensor Core accumulate * `INT8` inputs/output, `INT32` Tensor Core compute *Compatibility notes*: * *cuSPARSELt* requires CUDA 11.0 or above