********
Overview
********

This section describes the basic working principle of the *cuStateVec* library. For a general introduction to quantum circuits, please refer to :doc:`Introduction to quantum computing <../../overview>`.

Description of state vectors
============================
.. doxygenpage:: state_vector
   :content-only:

Bit ordering
============
.. doxygenpage:: bit_ordering
   :content-only:

Supported data types
====================
.. doxygenpage:: data_types
   :content-only:

.. _workspace-label:

Workspace
=========
.. doxygenpage:: workspace
   :content-only:

Gate fusion
===========

Gate applications account for large proportion of the computation cost in quantum simulators.
We can reduce the overall memory footprint required in gate applications by fusing multiple gates into one larger gate.

.. figure:: ./figures/fusion.png    
    :width: 480px
    :align: center

cuStateVec API supports these general gate applications with multiple qubits.
For the detailed availability, please refer to `custatevecApplyMatrix`.

Multi-GPU Computation
=====================

The memory usage in quantum circuit simulations increases exponentially with the number of qubits.
To simulate more qubits, multiple GPUs are required.
A typical approach is to divide the qubits into global and local ones.
For an :math:`M`-qubit system, suppose each GPU can store :math:`2^N` state vector elements (for :math:`N` local qubits),
then :math:`2^{M-N}` GPUs (that is, :math:`M-N` global qubits) are required to store the entire state vector.
The :math:`k`-th GPU (:math:`k = (i_{M-1} i_{M-2} \cdots i_{N})_2`) stores the state vector elements
:math:`\alpha_{i_{M-1} i_{M-2} \cdots i_{N} i_{N-1} \cdots i_{0}}` with :math:`i_p \in \{0, 1\}, 0 \leq p \leq N-1`.

For instance,

  - GPU #0 handles elements from :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 0_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 0_{N} 1_{N-1} \cdots 1_{0}}`,
  - GPU #1 handles elements from :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 1_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 1_{N} 1_{N-1} \cdots 1_{0}}`
  - GPU #2 handles elements from :math:`\alpha_{0_{M-1} \cdots 1_{N+1} 0_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 1_{N+1} 0_{N} 1_{N-1} \cdots 1_{0}}`
  - GPU #3 handles elements from :math:`\alpha_{0_{M-1} \cdots 1_{N+1} 1_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 1_{N+1} 1_{N} 1_{N-1} \cdots 1_{0}}`, and so on.

Here, the indices :math:`i_{M-1}, i_{M-2}, \cdots, i_{N}` belong to the global qubits, and others belong to the local qubits.

cuStateVec provides APIs for multi-GPU qubit measurement and sampling.
These APIs work on single GPU, and users are required to gather/scatter the results of each GPU.
As for details, please refer to :ref:`batchMeasureSection-label` and :ref:`samplerSection-label`.

Also for those who are interested in multi-GPU quantum simulations, :doc:`NVIDIA cuQuantum Appliance<../appliance/index>` is available.

.. note::

   Each GPU requires its own cuStateVec handle. Also, the users are responsible for switching the CUDA device context.