cuTENSOR: A High-Performance CUDA Library For Tensor Primitives¶
Welcome to the cuTENSOR library documentation.
cuTENSOR is a high-performance CUDA library for tensor primitives; its key features are:
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Conjugate (without transpose) support.
Support for up to 12-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
The documentation consists of three main components:
Require a sufficiently recent (GCC 5 or higher) libstdc++ when linking statically.
- User Guide
- Getting Started
- Plan Cache (beta)
- API Reference
- Software License Agreement