Release Notes

cuSPARSELt v0.0.1

New Features:

  • Initial release

  • Support Linux x86_64 and SM 8.0

  • Provide the following mixed-precision computation kernels:

    • FP16 inputs/output, FP32 Tensor Core accumulate

    • BFLOAT16 inputs/output, FP32 Tensor Core accumulate

    • INT8 inputs/output, INT32 Tensor Core compute

Compatibility notes:

  • cuSPARSELt requires CUDA 11.0 or above


cuSPARSELt v0.1.0

New Features:

  • Added support for Windows x86-64 and Linux Arm64 platforms

  • Introduced SM 8.6 compatibility

  • Added new kernels:

    • FP32 inputs/output, TF32 Tensor Core compute

    • TF32 inputs/output, TF32 Tensor Core compute

  • Better performance for SM 8.0 kernels (up to 90% SOL)

  • New APIs for compression and pruning decoupled from cusparseLtMatmulPlan_t

Compatibility notes:

  • cuSPARSELt requires CUDA 11.2 or above

  • cusparseLtMatDescriptor_t must be destroyed with cusparseLtMatDescriptorDestroy function

  • Both static and shared libraries must be linked with the nvrtc library

  • On Linux systems, both static and shared libraries must be linked with the dl library

Resolved issues:

  • CUSPARSELT_MATMUL_SEARCH_ITERATIONS is now handled correctly


cuSPARSELt v0.2.0

New Features:

  • Added support for activation functions and bias vector:

    • ReLU + upper bound and threshold setting for all kernels

    • GeLU for INT8 input/output, INT32 Tensor Core compute kernels

  • Added support for Batched Sparse GEMM:

    • Single sparse matrix / Multiple dense matrices (Broadcast)

    • Multiple sparse and dense matrices

    • Batched bias vector

Compatibility notes:

  • cuSPARSELt does not require the nvrtc library anymore

  • Support for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases


cuSPARSELt v0.3.0

New Features:

  • Added support for vectors of alpha and beta scalars (per-channel scaling)

  • Added support for GeLU scaling

  • Added support for Split-K Mode

  • Full support for logging functionalities and NVTX ranges

API Changes:

  • cusparseLtMatmulGetWorkspace() API to get workspace size needed by cusparseLtMatmul()

Resolved issues:

  • Fixed documentation issue regarding structured matrix size constraints