Release Notes¶
cuSPARSELt v0.0.1¶
New Features:
Initial release
Support
Linux x86_64andSM 8.0Provide the following mixed-precision computation kernels:
FP16inputs/output,FP32Tensor Core accumulateBFLOAT16inputs/output,FP32Tensor Core accumulateINT8inputs/output,INT32Tensor Core compute
Compatibility notes:
cuSPARSELt requires CUDA 11.0 or above
cuSPARSELt v0.1.0¶
New Features:
Added support for
Windows x86-64andLinux Arm64platformsIntroduced
SM 8.6compatibilityAdded new kernels:
FP32inputs/output,TF32Tensor Core computeTF32inputs/output,TF32Tensor Core compute
Better performance for
SM 8.0kernels (up to 90% SOL)New APIs for compression and pruning decoupled from
cusparseLtMatmulPlan_t
Compatibility notes:
cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_tmust be destroyed withcusparseLtMatDescriptorDestroyfunctionBoth static and shared libraries must be linked with the
nvrtclibraryOn Linux systems, both static and shared libraries must be linked with the
dllibrary
Resolved issues:
CUSPARSELT_MATMUL_SEARCH_ITERATIONSis now handled correctly
cuSPARSELt v0.2.0¶
New Features:
Added support for activation functions and bias vector:
ReLU + upper bound and threshold setting for all kernels
GeLU for
INT8input/output,INT32Tensor Core compute kernels
Added support for Batched Sparse GEMM:
Single sparse matrix / Multiple dense matrices (Broadcast)
Multiple sparse and dense matrices
Batched bias vector
Compatibility notes:
cuSPARSELt does not require the
nvrtclibrary anymoreSupport for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases
cuSPARSELt v0.3.0¶
New Features:
Added support for vectors of alpha and beta scalars (per-channel scaling)
Added support for GeLU scaling
Added support for Split-K Mode
Full support for logging functionalities and NVTX ranges
API Changes:
cusparseLtMatmulGetWorkspace()API to get workspace size needed bycusparseLtMatmul()
Resolved issues:
Fixed documentation issue regarding structured matrix size constraints