cuTENSOR: A High-Performance CUDA Library For Tensor Primitives#

Welcome to the cuTENSOR library documentation.

cuTENSOR is a high-performance CUDA library for tensor primitives.

Download: https://developer.nvidia.com/cutensor/downloads

Key Features#

Extensive mixed-precision support:

FP64 inputs with FP32 compute.

FP32 inputs with FP16, BF16, TF32, or 3XTF32 compute.

Complex-times-real operations.

Conjugate (without transpose) support.

Just-in-time (JIT) compilation

Support for up to 64-dimensional tensors.

Arbitrary data layouts.

Main computational routines:

Direct (i.e., transpose-free) tensor contractions.

Tensor reductions (including partial reductions).

Element-wise tensor operations:

Support for various activation functions.

Arbitrary tensor permutations.

Conversion between different data types.

Support for padding output tensors.

The documentation consists of three main components:

A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.

A Getting Started guide that steps through a simple tensor contraction example.

An API Reference that provides a comprehensive overview of all library routines, constants, and data types.

Support#

Operating System	CPU Architectures
`RHEL 8`, `openSUSE 15`, `SLES 15`, `Ubuntu 24.04/22.04/20.04`	`x86_64`, `SBSA`
`Windows 10`	`x86_64`

Supported CUDA Toolkits: 11.0, 11.8, 12.x
Supported SM Architectures : SM 7.0, SM7.5, SM 8.0, SM 8.9, SM 9.0, SM 10.0, SM 12.0
Deprecated OSs :

Prerequisites#

Dependencies : cudart, cutensor.h headers

Contents#

Indices And Tables#