cuTENSOR: A High-Performance CUDA Library For Tensor Primitives¶

Welcome to the cuTENSOR library documentation.

cuTENSOR is a high-performance CUDA library for tensor primitives.

Download: https://developer.nvidia.com/cutensor/downloads

Key Features¶

Extensive mixed-precision support:

FP64 inputs with FP32 compute.

FP32 inputs with FP16, BF16, or TF32 compute.

Complex-times-real operations.

Conjugate (without transpose) support.

Support for up to 40-dimensional tensors.

Arbitrary data layouts.

Trivially serializable data structures.

Main computational routines:

Direct (i.e., transpose-free) tensor contractions.

Tensor reductions (including partial reductions).

Element-wise tensor operations:

Support for various activation functions.

Arbitrary tensor permutations.

Conversion between different data types.

The documentation consists of three main components:

A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.

A Getting Started guide that steps through a simple tensor contraction example.

An API Reference that provides a comprehensive overview of all library routines, constants, and data types.

Support¶

Supported SM Architectures : SM 6.0, SM 7.0, SM 8.0

Supported OSs : RHEL 7/8, openSUSE 15, SLES 15, Ubuntu 20.04/18.04/16.04, Windows 10

Supported CPU Architectures : x86_64, ARM64, OpenPOWER

Prerequisites¶

Dependencies : cudart, cutensor.h headers

Contents¶

Indices And Tables¶