cuTENSOR: A High-Performance CUDA Library For Tensor Primitives¶
Welcome to the cuTENSOR library documentation.
cuTENSOR is a high-performance CUDA library for tensor primitives.
Download: https://developer.nvidia.com/cutensor/downloads
Key Features¶
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, TF32, or 3XTF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Just-in-time (JIT) compilation
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Main computational routines:
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
Support for padding output tensors.
The documentation consists of three main components:
A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.
A Getting Started guide that steps through a simple tensor contraction example.
An API Reference that provides a comprehensive overview of all library routines, constants, and data types.
Support¶
Operating System |
CPU Architectures |
---|---|
|
|
|
|
|
|
|
Prerequisites¶
Dependencies :
cudart
,cutensor.h
headers