Release Notes¶
cuTENSOR v1.3.1¶
Improved tensor contraction performance model (i.e., algo CUTENSOR_ALGO_DEFAULT)
Improved performance for tensor contraction that have an overall large contracted dimension (i.e., a parallel reduction was added)
cuTENSOR v1.3.0¶
Support up to 40-dimensional tensors
Support 64-bit strides
Up to 2x performance improvement across library
Support for BF16 Element-wise operations
Compatibility notes:
Not binary compatible with previous versions, due to added int64 stride support.
Binaries provided for CUDA 10.2/11.0/11.x (x>0) for x86_64 and OpenPower
Binaries provided for CUDA 11.0/11.x (x>0) for ARM64
Resolved issues:
Fixed bug with mixed real-complex contraction and strided data
cuTENSOR v1.2.2¶
Improved performance for Element-wise operations
Compatibility notes:
Binaries provided for CUDA 10.1/10.2/11.x for x86_64 and OpenPower
Binaries provided for CUDA 11.x for ARM64
cuTENSOR v1.2.1¶
Added examples to https://github.com/NVIDIA/CUDALibrarySamples
Compatibility notes:
Requires a sufficiently recent (GCC 5 or higher) libstdc++ when linking statically
Binaries provided for CUDA 10.1/10.2/11.0/11.1 for x86_64 and OpenPower
Binaries provided for CUDA 11.0/11.1 for ARM64
cuTENSOR v1.2.0¶
Support for cache plans and autotuning
Support BF16 for Elementwise and Reduction
Compatibility notes:
Binaries provided for CUDA 10.1/10.2/11.0 for x86_64 and OpenPower
Binaries provided for CUDA 11.0 for ARM64
cuTENSOR v1.1.0¶
Support for CUDA 11.0
Added support for
Windows 10 x86_64andLinux ARM64platformsAdded support for
SM 8.0Support third generation Tensor Cores
Improved performance
Compatibility notes:
Binaries provided for CUDA 10.1/10.2/11.0 for x86_64 and OpenPower
Binaries provided for CUDA 11.0 for ARM64
cuTENSOR v1.0.1¶
Added support for
SM 6.0
cuTENSOR v1.0.0¶
Initial release
Support for
SM 7.0Support mixed-precision operations
Support device-side alpha and beta
Support C != D
Compatibility notes:
cuTENSOR requires CUDA 10.1/10.2 for for x86_64 and OpenPower