cuTensorNet: A High-Performance Library for Tensor Network Computations¶
Welcome to the cuTensorNet library documentation!
NVIDIA cuTensorNet is a high-performance library for tensor network computations, a component of the NVIDIA cuQuantum SDK. Functionalities of cuTensorNet are described in Overview with installation and usage guide provided in Getting Started.
Key Features
- Based on NVIDIA’s high-performance tensor algebra library: cuTENSOR 
- Provides APIs for: - Creating a tensor or tensor network object 
- Finding a cost-optimal tensor network contraction path for any given tensor network 
- Finding a low-overhead slicing for the tensor network contraction to meet specified memory constraints 
- Tuning the tensor network contraction path finder configuration for better performance 
- Performing tensor network contraction plan generation, auto-tuning, and its subsequent execution 
- Gradually constructing a tensor network state (e.g., a quantum circuit state), followed by computing its properties, including arbitrary slices of amplitudes, expectation values, marginal distributions (reduced density matrices), as well as performing direct sampling 
- Performing backward propagation to compute gradients of the output tensor w.r.t. user-specified input tensors 
- Performing tensor decomposition using QR or SVD 
- Applying a quantum gate operand to a pair of connected (contracted) tensors 
- Enabling automatic distributed parallelization in the contraction path finder and executor 
- Enabling custom memory management 
- Logging 
 
Support
- Supported GPU Architectures: - Volta,- Turing,- Ampere,- Ada,- Hopper,- Blackwell
- Supported OS: - Linux
- Supported CPU Architectures: - x86_64,- ARM64
Prerequisites
- One of the following CUDA Toolkits and a compatible driver are required: - CUDA Toolkit - Minimum Required Linux Driver Version - >= 450.80.02 - >= 525.60.13 - Please refer to CUDA Toolkit Release Notes for the details. 
Contents
- Release Notes
- Overview
- Examples- Compiling code
- Code example (serial)
- Code example (automatic slice-based distributed parallelization)
- Code example (manual slice-based distributed parallelization)
- Code example (tensorQR)
- Code example (tensorSVD)
- Code example (GateSplit)
- Code example (MPS factorization)
- Code example (intermediate tensor reuse)
- Code example (gradients computation)
- Code example (amplitudes slice)
- Code example (expectation value)
- Code example (marginal distribution)
- Code example (tensor network sampling)
- Code example (MPS amplitudes slice using simple update)
- Code example (MPS expectation value)
- Code example (MPS marginal distribution)
- Code example (MPS sampling)
- Code example (MPS sampling QFT)
- Code example (MPS sampling MPO)
- Useful tips
 
- API Reference
- Acknowledgements