********
Overview
********

This section describes the basic working principle of the *cuStateVec* library. For a general introduction to quantum circuits, please refer to :doc:`Introduction to quantum computing <../../overview>`.

Description of state vectors
============================
.. doxygenpage:: state_vector
   :content-only:

Bit ordering
============
.. doxygenpage:: bit_ordering
   :content-only:

Supported data types
====================
.. doxygenpage:: data_types
   :content-only:

Workspace
=========
.. doxygenpage:: workspace
   :content-only:

Gate fusion
===========

Gate applications account for large proportion of the computation cost in quantum simulators.
We can reduce the overall memory footprint required in gate applications by fusing multiple gates into one larger gate.

.. figure:: ./figures/fusion.png    
    :width: 480px
    :align: center


cuStateVec API supports these general gate applications with multiple qubits.
For the detailed availability, please refer to `custatevecApplyMatrix`.

..
  Qubit reordering
  ================
  The memory usage in quantum simulations increases exponentially with the number of qubits.
  To use many qubits, multiple GPUs are required.
  A typical approach is to divide all the qubits into global qubits and local qubits.
  Suppose we use :math:`M` qubits and each GPU can store :math:`2^N` state vector elements or :math:`N` qubits.
  Then :math:`2^{M-N}` GPUs are required to store the entire vector.
  The :math:`k`-th GPU handles the following elements: 
  .. math:: 
    \alpha_{i_{M-1}i_{M-2}\cdots i_{N} i_{N-1}\cdots i_{0}}
    \ s.t. \ k = (i_{M-1}i_{M-2}\cdots i_{N})_2, i_p \in \{0, 1\}, 0 \leq p \leq N-1.
  | For instance, 
  | GPU #0 handles from :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 0_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 0_{N} 1_{N-1} \cdots 1_{0}}`,
  | GPU #1 handles from :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 1_{N} 0_{N-1} \cdots 0_{0}}` to :math:`\alpha_{0_{M-1} \cdots 0_{N+1} 1_{N} 1_{N-1} \cdots 1_{0}}`, and so on.
  | Here, :math:`i_{M-1}, i_{M-2}, \cdots, i_{N}` belong to global qubits, and others belong to local qubits.
  Gate applications with global qubits requires data transfer of state vector elements between GPUs.
  However, it is known that this transfer can become the bottleneck of the performance. 
  To reduce the data transfer cost, cuStateVec provides an API to reorder the qubits.
  With qubit reordering, we can target only local qubits in the gate applications, which does not require any data transfer between GPUs.
  .. figure:: ./figures/reordering.png
    :width: 600px
    :align: center