Getting Started

Installing cuQuantum

From conda-forge

If you don’t already have conda installed, we recommend you do so. Installing conda is straightforward, and can be done by following our section on best practices with conda.

cuQuantum

Using conda, you can install cuQuantum with the following command:

conda install -c conda-forge cuquantum

Note

To enable automatic MPI parallelism for cuTensorNet, install cuquantum with Open MPI from conda-forge: conda install -c conda-forge cuquantum openmpi. For more information, please refer to cuTensorNet’s MPI Installation Notes.

Warning

The mpich package on conda-forge is not CUDA-aware. For a workaround, see our notes on passing an external package to conda.

cuQuantum Python

To install cuQuantum Python, use this command:

conda install -c conda-forge cuquantum-python

Specifying CUDA Version

If you need to specify version of CUDA, use the cuda-version package.

For cuquantum:

conda install -c conda-forge cuquantum cuda-version=12

For cuquantum-python:

conda install -c conda-forge cuquantum-python cuda-version=12

The conda solver will install all required dependencies for you.

Individual Components

cuStateVec

To install only cuStateVec from conda, use this command:

conda install -c conda-forge custatevec
cuTensorNet

To install only cuTensorNet from conda, use this command:

conda install -c conda-forge cutensornet

MPI Installation Notes

cuTensorNet

cuTensorNet natively supports MPI. To encapsulate MPI-related features, we provide and distribute a separate library alongside cuTensorNet. The library is called libcutensornet_distributed_interface_mpi.so. When you install cuTensorNet with Open MPI, an environment variable is set by conda. The variable is called CUTENSORNET_COMM_LIB, and it tells libcutensornet.so which distributed interface library to use. If this variable is not set while MPI support is enabled, cuTensorNet will error at runtime.

Note

The mpich package on conda-forge is not CUDA-aware. To work around this, you can provide your system’s MPI implementation to conda as an external package. To do this, use conda install -c conda-forge cutensornet "mpich=*=external_*". As long as the system-provided MPI implementation is discoverable by conda, it should function as a regular conda package. For more information, see conda-forge’s documentation.

Warning

If you have enabled cuTensorNet’s distributed interface, you must set this environment variable. If you do not, an internal MPI initialization attempt within cuTensorNet will result in errors.

Best Practices with conda

To install conda under ${conda_install_prefix}, use the following command:

conda_install_prefix="~/" \
operating_system="$(uname)" \
architecture="$(uname -m)" \
miniforge_source="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge-pypy3-${operating_system}-${architecture}.sh" \
curl -L ${miniforge_source} -o ${conda_install_prefix}/install_miniforge.sh && \
bash ${conda_install_prefix}/install_miniforge.sh -b -p ${conda_install_prefix}/conda && \
rm ${conda_install_prefix}/install_miniforge.sh && \
.${conda_install_prefix}/conda/bin/conda init && \
source ~/.bashrc

For more information, have a look at Miniforge’s GitHub repository. Additionally, you may want to update conda’s package solver to use mamba:

conda config --set solver libmamba

To confirm the solver is installed and configured:

conda config --show | grep -i "solver" ; conda list | grep -i "mamba"
...
solver: libmamba
...
conda-libmamba-solver   23.3.0     pyhd8ed1ab_0   conda-forge
libmamba                1.4.2        hcea66bb_0   conda-forge
libmambapy              1.4.2    py39habbe595_0   conda-forge
mamba                   1.4.2    py39hcbb6966_0   conda-forge

Note

If you are using Miniforge, conda-forge is a default channel, and you do not have to specify -c conda-forge in conda subcommands.

Setting CUQUANTUM_ROOT

Words of Caution

With conda, packages are installed in the current ${CONDA_PREFIX}. If your use of cuQuantum involves building libraries or packages that use cuTensorNet or cuStateVec, you may be required to provide or define CUQUANTUM_ROOT. One way to set this is to use the ${CONDA_PREFIX} as follows:

export CUQUANTUM_ROOT=${CONDA_PREFIX}

This will work, and is safe to use alongside the conda environment where cuQuantum is installed. If you change conda environments while this variable is set, it will result in unexpected behavior.

Warning

Setting environment variables referring to specific conda environments is not recommended. Using CUQUANTUM_ROOT=${CONDA_PREFIX} should be done with caution. For similar reasons, it is not advisable to use ${CONDA_PREFIX} in your LD_LIBRARY_PATH when using conda-build or similar. For more detail, see this section of conda-build’s documentation.

Our Recommendation

When setting CUQUANTUM_ROOT, you should use conda env config vars set to create an environment variable attached to the conda environment where cuQuantum is installed:

conda env config vars set CUQUANTUM_ROOT=${CONDA_PREFIX}

Confirm that the environment variable is set correctly by reactivating the conda environment:

conda activate my_conda_env && \
echo ${CUQUANTUM_ROOT}

From PyPI

cuQuantum and cuQuantum Python are available on PyPI in the form of meta-packages. Upon installation, the CUDA version is detected and the appropriate binaries are fetched. The PyPI package for cuQuantum is hosted under the cuquantum project. The PyPI package for cuQuantum Python is hosted under the cuquantum-python project.

Note

The argument --no-cache-dir is required for pip 23.1+. It forces pip to execute the CUDA version detection logic.

cuQuantum

pip install -v --no-cache-dir cuquantum

cuQuantum Python

pip install -v --no-cache-dir cuquantum-python

From NVIDIA DevZone

Using Archive

The cuQuantum archive can be downloaded from the NVIDIA’s developer website at https://developer.nvidia.com/cuQuantum-downloads Please note that cuTensorNet depends on cuTENSOR. Documentation and download instructions are available under the NVIDIA cuTENSOR Developer website.

The cuQuantum archive name takes the following form:

cuquantum-linux-${ARCH}-${CUQUANTUM_VERSION}.${BUILD_NUMBER}_cuda${CUDA_VERSION}-archive.tar.xz

For example, to download the x86_64 archive with CUDA version 12, use:

wget https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-23.10.0.6_cuda12-archive.tar.xz

To expand the archive, use:

tar -xvf cuquantum-linux-x86_64-23.10.0.6_cuda12-archive.tar.xz

And finally, update your .bashrc (or similar) so that CUQUANTUM_ROOT is defined:

export CUQUANTUM_ROOT=/path/to/cuquantum-linux-x86_64-23.10.0.6_cuda12-archive

Note

To enable native MPI support for cuTensorNet, please see the associated MPI installation notes.

MPI Installation Notes

To enable native MPI support for cuTensorNet when using the archive from the NVIDIA DevZone, follow this procedure:

  1. Navigate to the root directory of cutensornet_distributed_interface_mpi.c.

  2. Run the activation script and compile the library providing MPI support.

  3. Set the environment variable called CUTENSORNET_COMM_LIB.

Run the Activation Script

The script, activate_mpi.sh, will compile cuTensorNet’s distributed interface by calling gcc:

cat activate_mpi.sh
...
gcc -shared -std=c99 -fPIC \
  -I${CUDA_PATH}/include -I../include -I${MPI_PATH}/include \
  cutensornet_distributed_interface_mpi.c \
  -L${MPI_PATH}/lib64 -L${MPI_PATH}/lib -lmpi \
  -o libcutensornet_distributed_interface_mpi.so
export CUTENSORNET_COMM_LIB=${PWD}/libcutensornet_distributed_interface_mpi.so

The compilation command requires the following variables to be defined:

  1. CUDA_PATH … the path to your CUDA installation (e.g.) /usr/local/cuda

  2. MPI_PATH … the path to your MPI installation

We expect to find mpi.h under ${MPI_PATH}/include. This activation script will also export a definition for CUTENSORNET_COMM_LIB=${CUQUANTUM_ROOT}/distributed_interfaces/libcutensornet_distributed_mpi.so. You should add this to your .bashrc or similar.

Note

Adding environment variables to your .bashrc should be done with care. If you plan to use the archive alongside a conda environment, review our best practices section.

Using System Package Managers

To use your system’s package manager to install cuQuantum, use NVIDIA’s selection tool for cuQuantum.

../_images/nvidia-devzone-cuquantum-selector.png

As an illustrative example, the commands for deb (network) installation under Ubuntu 22.04 x86_64 are outlined in the following sections. We also describe how update-alternatives can be used to manage your installation. See this list for quick navigation:

  1. Keyring installation and update.

  2. Generic *cuQuantum* installation.

  3. Installation with specific CUDA version.

  4. Installation management with update-alternatives.

Keyring Installation

The commands to download the keyring, install it, and update the packaging tool are as follows:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg --install cuda-keyring_1.1-1_all.deb
sudo apt-get update
Generic Installation

To install cuquantum, use this command:

sudo apt-get --yes install cuquantum

Note

A generic installation of cuQuantum will download and configure cuQuantum for all currently supported CUDA major versions. For details on managing your installation of cuQuantum review our section on using update-alternatives.

Specifying CUDA Version

The command to install cuquantum with a specific version of CUDA looks like this:

sudo apt-get --yes install cuquantum-cuda-XY

Where XY indicates the major version of CUDA. For example, the following command will install cuquantum with support for a major version of CUDA equal to 12:

sudo apt-get --yes install cuquantum-cuda-12

Valid major versions of CUDA

‎ Show supported versions and commands
CUDA 11sudo apt-get --yes install cuquantum-cuda-11
CUDA 12sudo apt-get --yes install cuquantum-cuda-12
Using update-alternatives

update-alternatives can be used to view and update the provider for /usr/src/libcuquantum/distributed_interfaces. To view the system’s installation of cuQuantum, you can use this command:

sudo update-alternatives --display cuquantum
cuquantum - auto mode
  link best version is /usr/src/libcuquantum/12/distributed_interfaces
  link currently points to /usr/src/libcuquantum/12/distributed_interfaces
  link cuquantum is /usr/src/libcuquantum/distributed_interfaces
  slave custatevec.h is /usr/include/custatevec.h
  slave cutensornet is /usr/include/cutensornet
  slave cutensornet.h is /usr/include/cutensornet.h
  slave libcustatevec.so is /usr/lib/x86_64-linux-gnu/libcustatevec.so
  slave libcustatevec.so.1 is /usr/lib/x86_64-linux-gnu/libcustatevec.so.1
  slave libcustatevec_static.a is /usr/lib/x86_64-linux-gnu/libcustatevec_static.a
  slave libcutensornet.so is /usr/lib/x86_64-linux-gnu/libcutensornet.so
  slave libcutensornet.so.2 is /usr/lib/x86_64-linux-gnu/libcutensornet.so.2
  slave libcutensornet_static.a is /usr/lib/x86_64-linux-gnu/libcutensornet_static.a
/usr/src/libcuquantum/11/distributed_interfaces - priority 110
  slave custatevec.h: /usr/include/libcuquantum/11/custatevec.h
  slave cutensornet: /usr/include/libcuquantum/11/cutensornet
  slave cutensornet.h: /usr/include/libcuquantum/11/cutensornet.h
  slave libcustatevec.so: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcustatevec.so
  slave libcustatevec.so.1: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcustatevec.so.1
  slave libcustatevec_static.a: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcustatevec_static.a
  slave libcutensornet.so: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcutensornet.so
  slave libcutensornet.so.2: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcutensornet.so.2
  slave libcutensornet_static.a: /usr/lib/x86_64-linux-gnu/libcuquantum/11/libcutensornet_static.a
/usr/src/libcuquantum/12/distributed_interfaces - priority 120
  slave custatevec.h: /usr/include/libcuquantum/12/custatevec.h
  slave cutensornet: /usr/include/libcuquantum/12/cutensornet
  slave cutensornet.h: /usr/include/libcuquantum/12/cutensornet.h
  slave libcustatevec.so: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcustatevec.so
  slave libcustatevec.so.1: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcustatevec.so.1
  slave libcustatevec_static.a: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcustatevec_static.a
  slave libcutensornet.so: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcutensornet.so
  slave libcutensornet.so.2: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcutensornet.so.2
  slave libcutensornet_static.a: /usr/lib/x86_64-linux-gnu/libcuquantum/12/libcutensornet_static.a

To manage your installation of cuQuantum with update-alternatives, use this command to select the provider for /usr/src/libcuquantum/distributed_interfaces:

sudo update-alternatives --config cuquantum
...
There are 2 choices for the alternative cuquantum (providing /usr/src/libcuquantum/distributed_interfaces).

  Selection    Path                                             Priority   Status
------------------------------------------------------------
* 0            /usr/src/libcuquantum/12/distributed_interfaces   120       auto mode
  1            /usr/src/libcuquantum/11/distributed_interfaces   110       manual mode
  2            /usr/src/libcuquantum/12/distributed_interfaces   120       manual mode

Press <enter> to keep the current choice[*], or type selection number: ...

You must specify a Selection number, and refer to the major version of CUDA you wish to use.

Installing cuQuantum with Frameworks

CUDA Quantum

cuQuantum is the default simulator for systems where NVIDIA GPUs are available. For more information, see CUDA Quantum’s documentation. To install CUDA Quantum, use the following command:

pip install cuda-quantum

If you plan to use CUDA Quantum with a conda environment, please follow the instructions on CUDA Quantum’s PyPI project page, here: https://pypi.org/project/cuda-quantum.

Qiskit

cuQuantum is distributed with qiskit-aer-gpu as an available backend. Different APIs in Qiskit will enable or rely on functionality provided by cuQuantum differently. For example, if you are using the AerSimulator, leveraging cuStateVec is possible with the keyword argument: cuStateVec_enable=True. For GPU accelerated tensor-network based simulation, cuTensorNet is the default.

conda-forge

To install a GPU-enabled version of Qiskit, use the following command:

conda install -c conda-forge qiskit-aer

Note

To install with a specific version of CUDA, use the cuda-version package:

conda install -c conda-forge qiskit-aer cuda-version=XY

where XY is the CUDA major version.

‎ Show supported versions and commands.
CUDA 11conda install -c conda-forge qiskit-aer cuda-version=11
CUDA 12conda install -c conda-forge qiskit-aer cuda-version=12

PyPI

To install a GPU-enabled version of Qiskit, use the following command:

pip install -v --no-cache-dir qiskit-aer-gpu

Cirq

To use cuQuantum with Cirq, you’ll can compile qsim or install the qsimcirq package from conda-forge. We’ll assume you’ve installed cuQuantum with conda. Upon completion you can use the cuStateVec backend by setting gpu_mode=1 in the QSimOptions object passed to various qsimcirq simulator components.

with the cuQuantum Appliance

QSimOptions has a different API signature in the cuQuantum Appliance:

class QSimOptions:
    ...
    gpu_mode: Union[int, Sequence[int]] = (0,)
    gpu_network: int = 0
    ...

These options are described in the cuQuantum Appliance section for Cirq. Here’s an excerpt from that section:

gpu_mode

The GPU simulator backend to use. If 1, the simulator backend will use cuStateVec. If n, an integer greater than 1, the simulator will use the multi-GPU backend with the first n devices. If a sequence of integers, the simulator will use the multi-GPU backend with devices whose ordinals match the values in the list. Default is to use the multi-GPU backend with device 0.

gpu_network

Topology of inter-GPU data transfer network. This option is effective when multi-GPU support is enabled. Supported network topologies are switch network and full mesh network. If 0 is specified, network topology is automatically detected. If 1 or 2 is specified, switch or full mesh network is selected, respectively. Switch network is aiming at supporting GPU data transfer network in DGX A100 and DGX-2 in which all GPUs are connected to NVSwitch via NVLink. GPUs connected via PCIe switches are also considered as the switch network. Full mesh network is aiming at supporting GPU data transfer networks seen in DGX Station A100/V100 in which all devices are directly connected via NVLink.

conda-forge

If you prefer to install qsim from conda, use the following command:

conda install -c conda-forge qsimcirq

Note

To install with a specific version of CUDA, use the cuda-version package:

conda install -c conda-forge qsimcirq cuda-version=XY

where XY is the CUDA major version.

‎ Show supported versions and commands.
CUDA 11conda install -c conda-forge qsimcirq cuda-version=11
CUDA 12conda install -c conda-forge qsimcirq cuda-version=12

exporting CUQUANTUM_ROOT

While we include export CUQUANTUM_ROOT=${CONDA_PREFIX} in the commands for this section, they are superfluous if you followed our best practices recommendations. Specifically, you should not export CUQUANTUM_ROOT=... as it defines the environment variable globally for your current shell session. We include the export command to guarantee that the commands will execute successfully when copied and pasted.

Source

export CUQUANTUM_ROOT=${CONDA_PREFIX}
git clone https://github.com/quantumlib/qsim.git && \
    pip install -v --no-cache-dir .

Please note that the above command compiles qsim + qsimcirq from source and installs them into your local environment. This requires that a valid CUDA compiler toolchain is detectable by cmake. During the build, you should see status messages regarding a qsim_custatevec target. Given you’re building from source, you can use the editable mode in pip:

export CUQUANTUM_ROOT=${CONDA_PREFIX}
git clone https://github.com/quantumlib/qsim.git && \
    pip install -v --no-cache-dir --editable .

While intended for developers, it can also make it easier to pull updates from the source repository.

Note

Follow our recommendation for setting environment variables attached to a conda environment. For more details, see the section on setting cuQuantum’s root directory.

PennyLane

cuQuantum is a dependency of pennylane-lightning[gpu]. To install PennyLane with GPU acceleration, please refer to PennyLane’s installation instructions under their lightning-gpu project.

conda-forge

If you prefer to install PennyLane from conda, use the following command:

conda install -c conda-forge pennylane-lightning-gpu

Note

To install with a specific version of CUDA, use the cuda-version package:

conda install -c conda-forge pennylane-lightning-gpu cuda-version=XY

where XY is the CUDA major version.

‎ Show supported versions and commands.
CUDA 11conda install -c conda-forge pennylane-lightning-gpu cuda-version=11
CUDA 12conda install -c conda-forge pennylane-lightning-gpu cuda-version=12

PyPI

The cuQuantum SDK must be in your LD_LIBRARY_PATH. Installation is done with this command:

pip install pennylane-lightning[gpu]

Running the cuQuantum Appliance

The cuQuantum Appliance is available on NGC: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuquantum-appliance

Regardless of how you use the cuQuantum Appliance, you must pull the container:

docker pull nvcr.io/nvidia/cuquantum-appliance:23.10

In the 23.10 release, it is necessary to change tag names for ARM64-based machines:

docker pull nvcr.io/nvidia/cuquantum-appliance:23.10-arm64

Note

Running the container will pull it if it is not available to the current Docker engine.

At the Command-line

The commands in the following subsections were drawn from the cuQuantum Appliance Overview page on NGC: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuquantum-appliance

You can run the cuQuantum Appliance at the command-line in a number of ways:

  1. With an interactive session.

  2. With a noninteractive session.

  3. With specific GPUs.

With an Interactive Session

To provision a pseudo-terminal session and attach to STDIN, use the following command:

docker run --gpus all -it --rm \
    nvcr.io/nvidia/cuquantum-appliance:23.10

With a Noninteractive Session

To noninteractively issue a single command to the container runtime, and exit, use the following command:

docker run --gpus all --rm \
    nvcr.io/nvidia/cuquantum-appliance:23.10 \
        python /home/cuquantum/examples/{example_name}.py

With Specific GPUs

To specify which GPUs to use in the container, use the following command:

# with specific GPUs
docker run --gpus '"device=0,3"' -it --rm \
    nvcr.io/nvidia/cuquantum-appliance:23.10

Using Remote Hosts

Under many circumstances, using remote computational resources is both more effective and more efficient (and often necessary). With appropriate configuration, remote resources extend the capability of our local development environments, making us more productive and permitting stronger application design and deployment.

We make a few assumptions throughout the following sections:

  1. You have configured ~/.ssh/config with useful, concise aliases.

  2. You have a valid installation of OpenSSH on your machine.

  3. For all remote hosts you wish to use, you’ve configured key-based authentication.

Clarifying our Assumptions

Useful + Concise Alias

A useful, concise alias will have a form consistent with the following:

cat ~/.ssh/config
...
Host my_concise_alias
  User my_username
  HostName my_remote_host
  Port my_remote_port
  IdentityFile /path/to/identity/file
Valid Installation of OpenSSH

Confirming a valid installation of OpenSSH on your machine can be done with the following commands:

OpenSSH Version

Open a terminal, and issue the following commands:

ssh -V

‎ ‎
‎ ‎ typical output version on Linux

OpenSSH_8.2p1 Ubuntu-4ubuntu0.5, OpenSSL 1.1.1f  31 Mar 2020

‎ ‎ typical output version on Windows

OpenSSH_for_Windows_8.6p1, LibreSSL 3.4.3
Key-based Authentication Example

To check that key-based authentication is set up and in use, type the following commands in your terminal:

ssh -v my_concise_alias
...
debug1: Authentication succeeded (publickey).
...

With DOCKER_HOST

A simple way of orchestrating a remote host is by using DOCKER_HOST. Once set, the environment variable assignment instructs the local Docker CLI to use the Docker engine provided by the remote defined in DOCKER_HOST.

‎ ‎ on Linux

export DOCKER_HOST=ssh://my_concise_alias

‎ ‎

‎ ‎ on Windows

$env:DOCKER_HOST='ssh://my_concise_alias'

‎ ‎

Confirming Docker CLI + Remote Docker Engine

These commands instruct your local Docker CLI to connect to the Docker engine available at my_concise_alias over SSH, and use that Docker engine for all locally-executed CLI commands in the current command-line session.

For example, running the following command in a linux terminal will print all running Docker containers on my_concise_alias. Because we’re using the Docker CLI, the commands are independent of operating system.

docker run --name cuquantum-appliance \
    --gpus all \
    --network host \
    -itd nvcr.io/nvidia/cuquantum-appliance:23.10
...
CONTAINER ID   IMAGE                                     COMMAND                 CREATED        STATUS        PORTS  NAMES
...            nvcr.io/nvidia/cuquantum-appliance:23.10  "/usr/local/bin/entr…"  5 seconds ago  Up 4 seconds         cuquantum-appliance
...            ...                                       ...                     ...            ...           ...    ...

To confirm your local Docker CLI is using the remote, you can try a couple commands:

  1. Using nvidia-smi.

  2. Comparing hostname output.

Check nvidia-smi

Executing nvidia-smi on the remote is done with the following command:

docker exec cuquantum-appliance nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.36                 Driver Version: 530.36       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB           On | 00000000:01:00.0 Off |                    0 |
| N/A   38C    P0               63W / 275W|     17MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
Comparing Hostnames

Likewise, you can get all (long) hostnames from the remote Docker engine, and compare them with the same command executed in the local terminal.

First, get the hostname output from the remote host’s Docker engine:

docker exec cuquantum-appliance hostname --all-fqdns

Next, use this command to get your local hostname output on Linux:

hostname --all-fqdns

Or if you’re using Windows, use this command to GetHostByName:

[System.Net.Dns]::GetHostByName(($env:computerName)).HostName
Running Commands with Remote Docker Engine

Now, we can try running an example from the cuQuantum Appliance

docker exec cuquantum-appliance python /home/cuquantum/examples/qiskit_ghz.py --help
...
usage: qiskit_ghz.py [-h] [--nbits NBITS] [--precision {single,double}]
                     [--disable-cusvaer]

Qiskit ghz.

options:
  -h, --help            show this help message and exit
  --nbits NBITS         the number of qubits
  --precision {single,double}
                          numerical precision
  --disable-cusvaer       disable cusvaer

docker exec cuquantum-appliance python /home/cuquantum/examples/qiskit_ghz.py
...
{'00000000000000000000': 533, '11111111111111111111': 491}
...

With Docker Contexts

While useful, setting DOCKER_HOST=ssh://my_concise_alias every time a new terminal session is created is tedious. Likewise, we wouldn’t recommend setting DOCKER_HOST globally. Thankfully, Docker provides infrastructure for managing multi-container orchestration schemes. By using docker context, we can create distinct groups of remote Docker engines, and manage them with our local Docker CLI.

DOCKER ENDPOINT relies on various URI schemes. Importantly, it supports the same SSH URI scheme we used with DOCKER_HOST. This means the same SSH URI (ssh://my_concise_alias) that we used to connect our local Docker CLI to a single remote Docker engine can be used to create a Docker context.

Here’s the command schema for docker context:

docker context --help
...
Manage contexts

Usage:
  docker context [command]

Available Commands:
  create      Create new context
  export      Export a context to a tar or kubeconfig file
  import      Import a context from a tar or zip file
  inspect     Display detailed information on one or more contexts
  list        List available contexts
  rm          Remove one or more contexts
  show        Print the current context
  update      Update a context
  use         Set the default context

Flags:
  -h, --help   Help for context

Use "docker context [command] --help" for more information about a command.
List Docker Contexts

We can list the available contexts with the following commands. Thanks to the Docker CLI, the commands are the same on Linux and Windows, but the expected output content is slightly different.

docker context ls

‎ ‎ On Linux

...
NAME       DESCRIPTION                              DOCKER ENDPOINT              KUBERNETES ENDPOINT  ORCHESTRATOR
default *  Current DOCKER_HOST based configuration  unix:///var/run/docker.sock                       swarm

‎ ‎ On Windows

...
NAME           TYPE  DESCRIPTION                              DOCKER ENDPOINT                            KUBERNETES ENDPOINT  ORCHESTRATOR
default *      moby  Current DOCKER_HOST based configuration  npipe:////./pipe/docker_engine
desktop-linux  moby  Docker Desktop                           npipe:////./pipe/dockerDesktopLinuxEngine
Context Creation

We can create a new context using our SSH alias as follows:

docker context create my_context --docker "host=ssh://my_concise_alias"
...
my_context
Successfully created context "my_context"

And rerunning docker context ls produces:

docker context ls
...
NAME         TYPE   DESCRIPTION   DOCKER ENDPOINT                 KUBERNETES ENDPOINT   ORCHESTRATOR
...          ...    ...           ...                             ...                   ...
my_context   moby                 ssh://my_concise_alias
Using Contexts

To use the newly created context, run this command:

docker context use my_context

And local commands using the Docker CLI will be forwarded to the Docker endpoints provided by my_context. Running the cuQuantum Appliance is done, as before, with the following command:

docker run --name cuquantum-appliance \
    --gpus all --network host \
    -itd nvcr.io/nvidia/cuquantum-appliance:23.10

And running examples is likewise identical:

docker exec cuquantum-appliance python /home/cuquantum/examples/qiskit_ghz.py

Note

The use contexts allows us to manage persistent configuration, and to group multiple endpoints by some feature (for example: architecture, number of GPUs, GPU type, etc.).

Interacting with the Remote Container

It can be useful to connect to (and interact with) the remote container for the following tasks:

  1. Designing workflows for asynchronous execution.

  2. Debugging workload components.

  3. Circuit analysis.

There are many ways to connect to the remote container. One method is to attach the running remote container to a locally provisioned IDE. We’ll show you two methods that rely on different technologies:

  1. Using the SSH and Dev Container extensions in Visual Studio Code.

  2. Using a running Jupyter Lab server in the remote containter.

Visual Studio Code

To use Visual Studio Code to attach to a running remote container, you’ll need two extensions:

  1. The Remote - SSH extension.

  2. The Dev Containers extension.

Remote - SSH

To install the Remote - SSH extension at the command-line, use the following:

code --install-extension ms-vscode-remote.remote-ssh

Otherwise, you can install it from the marketplace listing: https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh

Dev Containers

To install the Dev Containers extension at the command-line, use the following:

code --install-extension ms-vscode-remote.remote-containers

Otherwise, you can install it from the marketplace listing: https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers

Attaching to the Running Container

The procedure is as follows:

  1. Run the container on the remote host.

  2. Connect to the remote host using the Remote - SSH extension.

  3. Attach to the running container using the Dev Containers extension

Run the Container

Using the methods described in connecting to a remote docker engine, we can start a detached container with the following commands:

docker context use my_context && \
docker run --name cuquantum-appliance \
    --gpus all \
    --network host \
    -itd nvcr.io/nvidia/cuquantum-appliance:23.10

Or if you prefer not to use docker context, you can use DOCKER_HOST. First, set the environment variable as illustrated in associated section.

Then issue the docker run command:

docker run --name cuquantum-appliance --gpus all --network host -it -d nvcr.io/nvidia/cuquantum-appliance:23.10

If you’d rather not use the remote Docker engine insfrastructure at all, please skip to the section below on Connecting with the Remote - SSH extension.

As an illustrative demonstration, we include animations showing you what all of this looks like:

Using Docker Contexts

Executing docker run with an active Docker context

‎ ‎ ‎ ‎ show animation ‎ ‎ ‎ ‎../_images/docker-run-with-contexts.gif

Using DOCKER_HOST

Executing docker run with the DOCKER_HOST environment variable defined

… on Linux

‎ ‎ ‎ ‎ show animation ‎ ‎ ‎ ‎../_images/docker-run-with-docker-host-linux.gif

… on Windows

‎ ‎ ‎ ‎ show animation ‎ ‎ ‎ ‎../_images/docker-run-with-docker-host-win.gif
Connecting with Remote - SSH

There are two ways to connect to the remote host using the Remote - SSH extension:

  1. Using a graphical user interface (GUI) button in the lower-left corner of the Visual Studio Code window.

  2. Using Visual Studio Code’s command palette.

Remote - SSH Using GUI Button

‎ ‎ ‎ ‎ show animation ‎ ‎ ‎ ‎../_images/remote-ssh-button.gif

Remote - SSH Using Command Palette

To use the command palette to connect with a remote host using the Remote - SSH extension, you should follow this procedure:

  1. With Visual Studio Code open, type this key-sequence: ctrl+shift+p … This opens the command palette.

  2. While focused on the command palette, type connect to host … This should bring up the Remote - SSH command Remote-SSH: Connect to Host....

  3. Select the appropriate host by either using the arrow keys or by using your cursor and clicking.

Note

Press enter (or similar) to select a highlighted command. To navigate highlighted commands, use the arrow keys on your keyboard.

‎ ‎ ‎ ‎ show animation ‎ ‎ ‎ ‎../_images/remote-ssh-command-palette.gif
Attaching with Dev Containers

Using the command palette as illustrated in Connecting with the Remote - SSH extension, we can attach to the container.

Dev Containers with the Command Pallette

To use the command palette to attach to a running container on the remote host using the Dev Containers extension, you should follow this procedure:

  1. With Visual Studio Code open, type this key-sequence: ctrl+shift+p … This opens the command palette.

  2. While focused on the command palette, type Dev Containers … This should bring up the Dev Containers command Dev Containers: Attach to Running Container ....

  3. Select the appropriate container by either using the arrow keys or by using your cursor and clicking.

‎ ‎ ‎ ‎ show image ‎ ‎ ‎ ‎../_images/vscode-command-pallette-dev-containers.png

Running the cuQuantum Benchmarks

Available on GitHub: https://github.com/nvidia/cuquantum under benchmarks.

The direct link is https://github.com/nvidia/cuquantum/tree/main/benchmarks.

The cuQuantum Benchmarks is an installable Python package whose purpose is easing the performance evaluation and utility the cuQuantum offering provides. It enables the user to test cuQuantum under different integration scenarios. The cuQuantum Benchmarks package is installable on bare-metal systems and in the cuQuantum Appliance. To acquire the package, use

git clone https://github.com/NVIDIA/cuQuantum.git

For detailed installation instructions, see the corresponding section below on installing benchmarks.

Usage

We provide --help documentation for the cuQuantum Benchmarks CLI. You can query it with

cuquantum.benchmarks --help
...
usage: cuquantum-benchmarks [-h] {circuit,api} ...
...
positional arguments:
{circuit,api}
    circuit      benchmark different classes of quantum circuits
    api          benchmark different APIs from cuQuantum's libraries
optional arguments:
-h, --help     show this help message and exit

For example, you can run multi-node multi-GPU benchmarks in the cuQuantum Appliance with

mpirun -n ${NUM_GPUS} cuquantum-benchmarks \
    circuit --frontend qiskit --backend cusvaer \
        --benchmark ${BENCHMARK_NAME} \
        --nqubits ${NUM_QUBITS} --ngpus 1 \
        --cusvaer-global-index-bits ${LOG2_NUM_NODES_PER_GROUP},${LOG2_NUM_NODE_GROUPS} \
        --cusvaer-p2p-device-bits ${LOG2_NUM_PEERED_GPUS}

In this case, --ngpus 1 indicates that a single rank/process must be associated with a single GPU. The use of --cusvaer-global-index-bits and --cusvaer-p2p-device-bits specifies the network topology of the multi-GPU multi-node cluster:

Understanding Device and Index Bits

Argument Name

Input Form

Description

--cusvaer-global-index-bits

a list of ints

the network topology represented by state-vector load-balanced node group log2-sizes

--cusvaer-p2p-device-bits

int

state-vector load-balanced peer-accessible group log2-size

Note

By state-vector load-balanced node group, we refer to a group of GPUs whereby each GPU is given a state-vector partition of equal size. To uniformly distribute a quantum state consisting of only qubits to a group of GPUs, the size of the group must be a power of 2. To enforce this, we require the input network topology be expressed in terms of the number of qubits.

With the --cusvaer-global-index-bits option, you can define a networking topology with an arbitrary number of distinct layers. For more detailed information, please see the related documentation on defining network structure.

Installation

Bare-metal

To install cuQuantum Benchmarks on your system, you can use the following command:

‎ ‎ Benchmarks suite only

cd cuQuantum/benchmarks && \
   pip install .

‎ ‎

‎ ‎ Benchmarks and all optional dependencies

cd cuQuantum/benchmarks && \
   pip install .[all]

‎ ‎

Please note that this doesn’t guarantee all dependencies are installed with GPU support. Some frameworks have nonstandard build requirements to enable GPU support.

Appliance

To avoid inadvertantly overwriting software in the cuQuantum Appliance, install the cuQuantum Benchmarks with this command:

cd cuQuantum/benchmarks && \
    pip install .

Dependencies

cuStateVec

GPU Architectures

Volta, Turing, Ampere, Ada, Hopper

NVIDIA GPU with Compute Capability

7.0+

CUDA

11.x, 12.x

CPU architectures

x86_64, ppc64le, ARM64

Operating System

Linux

Driver

450.80.02+ (Linux) for CUDA 11.x
525.60.13+ (Linux) for CUDA 12.x

cuTensorNet

GPU Architectures

Volta, Turing, Ampere, Ada, Hopper

NVIDIA GPU with Compute Capability

7.0+

CUDA

11.x or 12.x

CPU architectures

x86_64, ppc64le, ARM64

Operating System

Linux

Driver

450.80.02+ (Linux/CUDA 11) or 525.60.13+ (Linux/CUDA 12)

cuTENSOR

v2.0.1+

cuQuantum Python

When Building

When Running

Python

3.9+

3.9+

pip

22.3.1+

N/A

setuptools

>=61.0.0

N/A

wheel

>=0.34.0

N/A

Cython

>=0.29.22,<3

N/A

cuStateVec

1.6.0

~=1.6

cuTensorNet

2.4.0

~=2.4

NumPy

N/A

v1.21+

CuPy (see CuPy installation guide)

N/A

v13.0.0+

PyTorch (optional, see PyTorch installation guide)

N/A

v1.10+

Qiskit (optional, see Qiskit installation guide)

N/A

v0.24.0+

Cirq (optional, see Cirq installation guide)

N/A

v0.6.0+

mpi4py (optional, see mpi4py installation guide)

N/A

v3.1.0+

cuQuantum Appliance

CUDA

11.x

12.x

Driver

470.57.02+ (Linux)

525.60.13+ (Linux)