Release Notes¶

cuSPARSELt v0.0.1¶

Initial release
Support Linux x86_64 and SM 8.0
Provide the following mixed-precision computation kernels:
- FP16 inputs/output, FP32 Tensor Core accumulate
- BFLOAT16 inputs/output, FP32 Tensor Core accumulate
- INT8 inputs/output, INT32 Tensor Core compute

Compatibility notes:

cuSPARSELt requires CUDA 11.0 or above

cuSPARSELt v0.1.0¶

Added support for Windows x86-64 and Linux Arm64 platforms
Introduced SM 8.6 compatibility
Added new kernels:
- FP32 inputs/output, TF32 Tensor Core compute
- TF32 inputs/output, TF32 Tensor Core compute
Better performance for SM 8.0 kernels (up to 90% SOL)
New APIs for compression and pruning decoupled from cusparseLtMatmulPlan_t

Compatibility notes:

cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_t must be destroyed with cusparseLtMatDescriptorDestroy function
Both static and shared libraries must be linked with the nvrtc library
On Linux systems, both static and shared libraries must be linked with the dl library

Resolved issues:

CUSPARSELT_MATMUL_SEARCH_ITERATIONS is now handled correctly

cuSPARSELt v0.2.0¶

Added support for activation functions and bias vector:
- ReLU + upper bound and threshold setting for all kernels
- GeLU for INT8 input/output, INT32 Tensor Core compute kernels
Added support for Batched Sparse GEMM:
- Single sparse matrix / Multiple dense matrices (Broadcast)
- Multiple sparse and dense matrices
- Batched bias vector

Compatibility notes:

cuSPARSELt does not require the nvrtc library anymore
Support for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases