Release Notes

cuSPARSELt v0.0.1

  • Initial release

  • Support Linux x86_64 and SM 8.0

  • Provide the following mixed-precision computation kernels:

    • FP16 inputs/output, FP32 Tensor Core accumulate

    • BFLOAT16 inputs/output, FP32 Tensor Core accumulate

    • INT8 inputs/output, INT32 Tensor Core compute

Compatibility notes:

  • cuSPARSELt requires CUDA 11.0 or above

cuSPARSELt v0.1.0

  • Added support for Windows x86-64 and Linux Arm64 platforms

  • Introduced SM 8.6 compatibility

  • Added new kernels:

    • FP32 inputs/output, TF32 Tensor Core compute

    • TF32 inputs/output, TF32 Tensor Core compute

  • Better performance for SM 8.0 kernels (up to 90% SOL)

  • New APIs for compression and pruning decoupled from cusparseLtMatmulPlan_t

Compatibility notes:

  • cuSPARSELt requires CUDA 11.2 or above

  • cusparseLtMatDescriptor_t must be destroyed with cusparseLtMatDescriptorDestroy function

  • Both static and shared libraries must be linked with the nvrtc library

  • On Linux systems, both static and shared libraries must be linked with the dl library

Resolved issues:

  • CUSPARSELT_MATMUL_SEARCH_ITERATIONS is now handled correctly