cuDensityMat: A High-Performance Library for Analog Quantum Dynamics Computations

Welcome to the cuDensityMat library documentation!

NVIDIA cuDensityMat is a high-performance library for accelerating analog quantum dynamics solvers, a component of the NVIDIA cuQuantum SDK. Functionalities of cuDensityMat are described in Overview with the installation and usage guide provided in Getting Started.

Key Features

  • Provides APIs for:

    • Defining arbitrary, possibly batched pure/mixed quantum states in arbitrary tensor-product spaces (any number and dimensions of the quantum degrees of freedom).

    • Defining arbitrary, possibly batched quantum many-body operators and superoperators in the form of a sum of tensor products of elementary tensor operators or full matrix operators. Each elementary tensor operator acts on one or more specific quantum degrees of freedom whereas each full matrix operator acts on all quantum degrees of freedom (full matrix operators, elementary tensor operators, and associated scalar coefficients may depend on time as well as on user-provided real parameters).

    • Computing the action of a many-body operator/superoperator on a quantum state.

    • Composing and computing the r.h.s. of the desired quantum dynamics master equation (ODE).

    • Computing properties of quantum states such as expectation values.

Support

  • Supported GPU Architectures: Volta, Turing, Ampere, Ada, Hopper, Blackwell

  • Supported OS: Linux

  • Supported CPU Architectures: x86_64, ARM64

Prerequisites

Known Issues and Limitations

  • cudensitymatOperatorTermAppendGeneralProduct() C API function accepts argument operatorModeStrides which may not be used for passing strides that differ from the default generalized columnwise storage layout (you can simply pass NULL for each elementary tensor operator strides).

  • The tensor callback feature currently requires CUDA stream synchronization between successive operator action compute calls.

  • Currently the dimension of the sparse tri-diagonal elementary tensor operator matrices may not exceed 256.

  • While cudensitymatComputeType_t is exposed as a parameter in some APIs, currently the actual compute type is solely inferred from the data type.

  • The number of parallel processes (equivalently, number of GPUs) in parallel runs must be a power of two, one GPU per parallel process.