Release Notes¶
cuSPARSELt v0.0.1¶
Initial release
Support
Linux x86_64andSM 8.0-
Provide the following mixed-precision computation kernels:
FP16inputs/output,FP32Tensor Core accumulateBFLOAT16inputs/output,FP32Tensor Core accumulateINT8inputs/output,INT32Tensor Core compute
Compatibility notes:
cuSPARSELt requires CUDA 11.0 or above
cuSPARSELt v0.1.0¶
Added support for
Windows x86-64andLinux Arm64platformsIntroduced
SM 8.6compatibility-
Added new kernels:
FP32inputs/output,TF32Tensor Core computeTF32inputs/output,TF32Tensor Core compute
Better performance for
SM 8.0kernels (up to 90% SOL)New APIs for compression and pruning decoupled from
cusparseLtMatmulPlan_t
Compatibility notes:
cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_tmust be destroyed withcusparseLtMatDescriptorDestroyfunctionBoth static and shared libraries must be linked with the
nvrtclibraryOn Linux systems, both static and shared libraries must be linked with the
dllibrary
Resolved issues:
CUSPARSELT_MATMUL_SEARCH_ITERATIONSis now handled correctly
cuSPARSELt v0.2.0¶
-
Added support for activation functions and bias vector:
ReLU + upper bound and threshold setting for all kernels
GeLU for
INT8input/output,INT32Tensor Core compute kernels
-
Added support for Batched Sparse GEMM:
Single sparse matrix / Multiple dense matrices (Broadcast)
Multiple sparse and dense matrices
Batched bias vector
Compatibility notes:
cuSPARSELt does not require the
nvrtclibrary anymoreSupport for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases