Installation#

Obtaining cuEST#

C API#

The cuEST C API can be downloaded from https://developer.nvidia.com/cuest-downloads.

cuEST supports a variety of Linux distributions, including Ubuntu, RHEL, SUSE, Debian, and SLES, and both x86_64 and ARM64 architectures. Detailed download and installation instructions are provided on the download page.

Both local and network installers are available. In most cases, the network installer will provide a streamlined user experience. With either installation method the shared library will be installed to /usr/lib64, and the header files will be installed in /usr/include.

Note

When working in Docker environments, users may find it helpful to use NVIDIA’s pre-built containers (https://hub.docker.com/r/nvidia/cuda)

Many users may find it convenient to download a complete tarball containing the cuEST headers and library. See here for more details.

Python API#

cuEST for Python can be obtained from several sources:

Important

We strongly recommend use of CTK 13 or later. cuEST with CTK 12 support is provided solely for those customers who are unable to upgrade. These customers may observe decreased performance relative to CTK 13, depending on the device.

Detailed installation instructions for installation using pip are located below.

Installing cuEST#

Prerequisites#

The cuEST library is compatible with CUDA Toolkits (CTKs) in the 12.x and 13.x series. While the library has been tested as far back as CUDA Toolkit 12.0 and driver version 535, we strongly recommend using more recent versions for optimal performance. There are highly significant performance gains available for modern compute capabilities that are only available through the CTK 13 builds.

Warning

If using a CUDA Toolkit from the 13 series, the version must be 13.0 update 2 (13.0.2) or newer.

C API from tarball#

When downloading the tarball installer, there are two subdirectories – cuda12 or cuda13 – corresponding to the CTK 12 or 13 compatibility, respectively; select the folder appropriate for your system’s configuration, keeping in mind that 13 should be used for optimal performance. When compiling code that uses cuEST, the include directory should be added to the compiler’s include path, and the cuest.h file should be included in any files using cuEST functionality. The lib directory should be provided to the linker during code compilation, and either the static or dynamic libcuest library should be linked. If dynamically linking, this lib folder should additionally be appended to the LD_LIBRARY_PATH variable to allow proper runtime detection.

The CUDA cuBLAS and cuSolver libraries must be linked by programs that link the cuEST library. A path to these libraries should also be placed in the LD_LIBRARY_PATH before running programs that call cuEST.

Note

The CTK 13 version should be used where possible, to extract maximum performance.

Python API from PyPi#

A lightweight set of bindings around the C API, the cuEST Python API, is available. These bindings are designed to very closely mirror the C API and the user is responsible for acquiring and freeing resources explicitly. Some PEP-8 conforming pythonic wrappers to simplify the calling syntax will be available in a future release, but the current bindings provide access to all cuEST functionality. The Python API is available for CTKs 12 and 13 via pip:

pip install nvidia-cuest-cu12

or:

pip install nvidia-cuest-cu13

Python bindings can be installed directly into the interpreter’s environment, however we recommend using a virtual environment such as venv, Conda, Mamba, or virtualenv. Python versions 3.11, 3.12, and 3.13 are currently supported. When running cuEST from Python, the LD_LIBRARY_PATH should include the CTK library path (usually /usr/local/cuda/lib64) to ensure the CUDA dependencies can be properly resolved at runtime.

Note

The CTK 13 version should be used where possible, to extract maximum performance.

Tested Configurations#

cuEST has been tested in the following environments:

CUDA

12.0, 12.9, 13.0.2, 13.1

GPU model

A100, H100, H200, B200, RTX Pro 6000

Python

3.11, 3.12, 3.13

CPU architecture

x86_64, SBSA (aarch64)

Compute Capabilities

8.0, 8.6, 8.9, 9.0, 10.0, 12.0