cuTENSOR: A High-Performance CUDA Library For Tensor Primitives#
Welcome to the cuTENSOR library documentation.
cuTENSOR is a high-performance CUDA library for tensor primitives.
Download: https://developer.nvidia.com/cutensor/downloads
Key Features#
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, TF32, or 3XTF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Just-in-time (JIT) compilation
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Main computational routines:
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
Support for padding output tensors.
The documentation consists of three main components:
A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.
A Getting Started guide that steps through a simple tensor contraction example.
An API Reference that provides a comprehensive overview of all library routines, constants, and data types.
Support#
Operating System |
CPU Architectures |
---|---|
|
|
|
|
|
Prerequisites#
Dependencies :
cudart
,cutensor.h
headers
Contents#
- Release Notes
- cuTENSOR v2.2.0
- cuTENSOR v2.1.0
- cuTENSOR v2.0.2
- cuTENSOR v2.0.0
- cuTENSOR v1.7.0
- cuTENSOR v1.6.2
- cuTENSOR v1.6.1
- cuTENSOR v1.6.0
- cuTENSOR v1.5.0
- cuTENSOR v1.4.0
- cuTENSOR v1.3.3
- cuTENSOR v1.3.2
- cuTENSOR v1.3.1
- cuTENSOR v1.3.0
- cuTENSOR v1.2.2
- cuTENSOR v1.2.1
- cuTENSOR v1.2.0
- cuTENSOR v1.1.0
- cuTENSOR v1.0.1
- cuTENSOR v1.0.0
- User Guide
- Getting Started
- Transition to cuTENSOR 2.x
- Just In Time (JIT) Compilation
- Plan Cache
- Multi-GPU support - cuTENSORMg
- API Reference
- API Reference - cuTENSORMg
- Software License Agreement
- Third Party License Agreements