Release Notes¶

cuSPARSELt v0.0.1¶

Initial release
Support Linux x86_64 and SM 8.0
Provide the following mixed-precision computation kernels:
- FP16 inputs/output, FP32 Tensor Core accumulate
- BFLOAT16 inputs/output, FP32 Tensor Core accumulate
- INT8 inputs/output, INT32 Tensor Core compute

Compatibility notes:

cuSPARSELt requires CUDA 11.0 or above

cuSPARSELt v0.1.0¶

Added support for Windows x86-64 and Linux Arm64 platforms
Introduced SM 8.6 compatibility
Added new kernels:
- FP32 inputs/output, TF32 Tensor Core compute
- TF32 inputs/output, TF32 Tensor Core compute
Better performance for SM 8.0 kernels (up to 90% SOL)
New APIs for compression and pruning decoupled from cusparseLtMatmulPlan_t

Compatibility notes:

cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_t must be destroyed with cusparseLtMatDescriptorDestroy function
Both static and shared libraries must be linked with the nvrtc library
On Linux systems, both static and shared libraries must be linked with the dl library

Resolved issues:

CUSPARSELT_MATMUL_SEARCH_ITERATIONS is now handled correctly