cuTENSOR: A High-Performance CUDA Library For Tensor Primitives

Welcome to the cuTENSOR library documentation.

cuTENSOR is a high-performance CUDA library for tensor primitives.

Download: https://developer.nvidia.com/cutensor/downloads

Key Features

  • Extensive mixed-precision support:

    • FP64 inputs with FP32 compute.

    • FP32 inputs with FP16, BF16, TF32, or 3XTF32 compute.

    • Complex-times-real operations.

    • Conjugate (without transpose) support.

  • Just-in-time (JIT) compilation

  • Support for up to 64-dimensional tensors.

  • Arbitrary data layouts.

  • Trivially serializable data structures.

  • Main computational routines:

The documentation consists of three main components:

  • A User Guide that introduces important basics of cuTENSOR including details on notation and accuracy.

  • A Getting Started guide that steps through a simple tensor contraction example.

  • An API Reference that provides a comprehensive overview of all library routines, constants, and data types.

Support

Operating System

CPU Architectures

RHEL 8, openSUSE 15, SLES 15, Ubuntu 22.04/20.04/18.04

x86_64, SBSA

RHEL 8, Ubuntu 22.04/20.04/18.04

OpenPOWER

Windows 10

x86_64

  • Supported CUDA Toolkits: 11.0, 11.8, 12.x

  • Supported SM Architectures : SM 6.0, SM 7.0, SM7.5, SM 8.0, SM 8.9, SM 9.0

  • Deprecated OSs : Ubuntu 16.04, RHEL 7

Prerequisites

  • Dependencies : cudart, cutensor.h headers

Contents

Indices And Tables