cuTENSOR: A High-Performance CUDA Library For Tensor Primitives¶
Welcome to the cuTENSOR library documentation.
cuTENSOR is a high-performance CUDA library for tensor primitives.
Download: https://developer.nvidia.com/cutensor/downloads
Key Features¶
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 40-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
The documentation consists of three main components:
A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.
A Getting Started guide that steps through a simple tensor contraction example.
An API Reference that provides a comprehensive overview of all library routines, constants, and data types.
Support¶
Supported SM Architectures :
SM 6.0,SM 7.0,SM 8.0Supported OSs :
RHEL 7/8,openSUSE 15,SLES 15,Ubuntu 20.04/18.04/16.04,Windows 10Supported CPU Architectures :
x86_64,ARM64,OpenPOWER
Prerequisites¶
Dependencies :
cudart,cutensor.hheaders