Release Notes¶

cuSPARSELt v0.0.1¶

New Features:

Initial release
Support Linux x86_64 and SM 8.0
Provide the following mixed-precision computation kernels:
- FP16 inputs/output, FP32 Tensor Core accumulate
- BFLOAT16 inputs/output, FP32 Tensor Core accumulate
- INT8 inputs/output, INT32 Tensor Core compute

Compatibility notes:

New Features:

Added support for Windows x86-64 and Linux Arm64 platforms
Introduced SM 8.6 compatibility
Added new kernels:
- FP32 inputs/output, TF32 Tensor Core compute
- TF32 inputs/output, TF32 Tensor Core compute
Better performance for SM 8.0 kernels (up to 90% SOL)
New APIs for compression and pruning decoupled from cusparseLtMatmulPlan_t

Compatibility notes:

cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_t must be destroyed with cusparseLtMatDescriptorDestroy function
Both static and shared libraries must be linked with the nvrtc library
On Linux systems, both static and shared libraries must be linked with the dl library

Resolved issues:

New Features:

Added support for activation functions and bias vector:
- ReLU + upper bound and threshold setting for all kernels
- GeLU for INT8 input/output, INT32 Tensor Core compute kernels
Added support for Batched Sparse GEMM:
- Single sparse matrix / Multiple dense matrices (Broadcast)
- Multiple sparse and dense matrices
- Batched bias vector

Compatibility notes:

cuSPARSELt does not require the nvrtc library anymore
Support for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases

New Features:

API Changes:

cusparseLtMatmulGetWorkspace() API to get workspace size needed by cusparseLtMatmul()

Resolved issues:

New Features:

Introduced SM 8.9 compatibility
The initialization time of cuSPARSELt descriptors have been significantly improved
cusparseLtMatmulSearch() efficiency has been improved
Removed any internal memory allocations
Added a new kernel for supporting the following data type combination: INT8 input, INT32 Tensor Core compute, FP16 output
Added cusparseLtGetVersion() and cusparseLtGetProperty() functions to retrieve the library version

API Changes:

cusparseLtSpMMACompressedSize(), cusparseLtSpMMACompress(), cusparseLtSpMMACompressedSize2(), cusparseLtSpMMACompress2() have a new parameter to avoid internal memory allocations and support user-provided device memory buffer for the compression

Compatibility notes: