cuTENSOR: A High-Performance CUDA Library For Tensor Primitives¶
Welcome to the cuTENSOR library documentation.
cuTENSOR is a high-performance CUDA library for tensor primitives; its key features are:
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 12-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
The documentation consists of three main components:
A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.
A Getting Started guide that steps through a simple tensor contraction example.
An API Reference that provides a comprehensive overview of all library routines, constants, and data types.