Release Notes¶
cuSPARSELt v0.0.1¶
New Features:
Initial release
Support
Linux x86_64andSM 8.0Provide the following mixed-precision computation kernels:
FP16inputs/output,FP32Tensor Core accumulateBFLOAT16inputs/output,FP32Tensor Core accumulateINT8inputs/output,INT32Tensor Core compute
Compatibility notes:
cuSPARSELt requires CUDA 11.0 or above
cuSPARSELt v0.1.0¶
New Features:
Added support for
Windows x86-64andLinux Arm64platformsIntroduced
SM 8.6compatibilityAdded new kernels:
FP32inputs/output,TF32Tensor Core computeTF32inputs/output,TF32Tensor Core compute
Better performance for
SM 8.0kernels (up to 90% SOL)New APIs for compression and pruning decoupled from
cusparseLtMatmulPlan_t
Compatibility notes:
cuSPARSELt requires CUDA 11.2 or above
cusparseLtMatDescriptor_tmust be destroyed withcusparseLtMatDescriptorDestroyfunctionBoth static and shared libraries must be linked with the
nvrtclibraryOn Linux systems, both static and shared libraries must be linked with the
dllibrary
Resolved issues:
CUSPARSELT_MATMUL_SEARCH_ITERATIONSis now handled correctly
cuSPARSELt v0.2.0¶
New Features:
Added support for activation functions and bias vector:
ReLU + upper bound and threshold setting for all kernels
GeLU for
INT8input/output,INT32Tensor Core compute kernels
Added support for Batched Sparse GEMM:
Single sparse matrix / Multiple dense matrices (Broadcast)
Multiple sparse and dense matrices
Batched bias vector
Compatibility notes:
cuSPARSELt does not require the
nvrtclibrary anymoreSupport for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases
cuSPARSELt v0.3.0¶
New Features:
Added support for vectors of alpha and beta scalars (per-channel scaling)
Added support for GeLU scaling
Added support for Split-K Mode
Full support for logging functionalities and NVTX ranges
API Changes:
cusparseLtMatmulGetWorkspace()API to get workspace size needed bycusparseLtMatmul()
Resolved issues:
Fixed documentation issue regarding structured matrix size constraints
cuSPARSELt v0.4.0¶
New Features:
Introduced
SM 8.9compatibilityThe initialization time of cuSPARSELt descriptors have been significantly improved
cusparseLtMatmulSearch()efficiency has been improvedRemoved any internal memory allocations
Added a new kernel for supporting the following data type combination:
INT8input,INT32Tensor Core compute,FP16outputAdded
cusparseLtGetVersion()andcusparseLtGetProperty()functions to retrieve the library version
API Changes:
cusparseLtSpMMACompressedSize(),cusparseLtSpMMACompress(),cusparseLtSpMMACompressedSize2(),cusparseLtSpMMACompress2()have a new parameter to avoid internal memory allocations and support user-provided device memory buffer for the compression
Compatibility notes:
cuSPARSELt requires CUDA Driver 470.xx (CUDA 11.4) or above
cuSPARSELt now uses the static version of the
cudartlibraryThe support for Ubuntu 16.04 (gcc-5) has been removed