.. _low-level python binding: ************************* Low-level Python Bindings ************************* Starting cuQuantum-python v25.03, all cuDensityMat, cuStateVec, and cuTensorNet C APIs are exposed under :mod:`cuquantum.bindings.cudensitymat`, :mod:`cuquantum.bindings.custatevec`, and :mod:`cuquantum.bindings.cutensornet` modules, respectively. The original cuStateVec and cuTensorNet bindings under the :mod:`cuquantum.custatevec` and :mod:`cuquantum.cutensornet` modules are now deprecated and will be removed in a future release. Naming & calling convention =========================== .. currentmodule:: cuquantum.bindings When the C APIs are exposed, we follow the `PEP 8`_ style guide and adopt the following changes: * All library name prefixes are stripped * The function names are broken by words and follow the camel case * The first letter in each word in the enum names is capitalized * Each enum's name prefix is stripped from its value's names * Common enums that can be used in all submodules are placed in the parent module :mod:`cuquantum` * Whenever applicable, the outputs are stripped away from the function arguments and returned directly as Python objects * Pointers are passed as Python :class:`int` Below is a non-exhaustive list of examples of such C-to-Python mappings: - Function: `custatevecGetDefaultWorkspaceSize` -> :func:`custatevec.get_default_workspace_size`. - Function: `cutensornetCreateNetworkDescriptor` -> :func:`cutensornet.create_network_descriptor`. - Enum type: `custatevecMatrixLayout_t` -> :class:`custatevec.MatrixLayout`. - Enum type: `cutensornetContractionOptimizerConfigAttributes_t` -> :class:`cutensornet.ContractionOptimizerConfigAttribute`. - Enum value name: `CUSTATEVEC_MATRIX_LAYOUT_COL` -> :data:`custatevec.MatrixLayout.COL`. - Enum value name: `CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_HYPER_NUM_SAMPLES` -> :data:`cutensornet.ContractionOptimizerConfigAttribute.HYPER_NUM_SAMPLES`. - Return: The outputs of `custatevecSamplerCreate` are the sampler descriptor and the required workspace size, which are wrapped as a 2-tuple in the corresponding :func:`custatevec.sampler_create` Python API. - Global enum: `custatevecComputeType_t` and `cutensornetComputeType_t` -> :class:`cuquantum.ComputeType`. There may be exceptions for the above rules, but they would be self-evident and properly documented. In the next section we discuss pointer passing in Python. .. _PEP 8: https://www.python.org/dev/peps/pep-0008/ Memory management ================= Pointer and data lifetime ------------------------- Unlike in C/C++, Python does not provide low-level primitives to allocate/deallocate host memory, not to mention device memory. In order to make the C APIs work with Python, it is important that memory management is properly done through Python proxy objects. In cuQuantum Python, we ask users to address such needs using NumPy (for host memory) and CuPy (for device memory). .. note:: It is also possible to use :class:`array.array` (plus :class:`memoryview` as needed) to manage host memory, however it is more tedious compared to using :class:`numpy.ndarray`, especially when it comes to array manipulation and computation. .. note:: It is also possible to use `CUDA Python`_ to manage device memory, but as of CUDA 11 there is no simple, pythonic way to modify the contents stored on GPU, which requires custom kernels. CuPy is a lightweight, NumPy-compatible array library to address this need. To pass data from Python to C, using pointer addresses (as Python :class:`int`) of various objects is required. With NumPy/CuPy arrays as the proxy, it is as simple as follows: .. code-block:: python # create a host buffer to hold 5 int buf = numpy.empty((5,), dtype=numpy.int32) # pass buf's pointer to the wrapper # buf could get modified in-place if the function writes to it my_func(..., buf.ctypes.data, ...) # examine/use buf's data print(buf) # create a device buffer to hold 10 double buf = cupy.empty((10,), dtype=cupy.float64) # pass buf's pointer to the wrapper # buf could get modified in-place if the function writes to it my_func(..., buf.data.ptr, ...) # examine/use buf's data print(buf) # create an untyped device buffer of 128 bytes buf = cupy.cuda.alloc(128) # pass buf's pointer to the wrapper # buf could get modified in-place if the function writes to it my_func(..., buf.ptr, ...) # buf is automatically destroyed when going out of scope Please be aware that the underlying assumption is that the arrays must be contiguous in memory (unless the C interface allows for specifying the array strides). As a consequence, for example, as of cuQuantum Python v0.1.0 all C structs (including handles and descriptors) are *not exposed* as Python classes; that is, they do not have their own types and are simply cast to plain Python :class:`int` for passing around. Any downstream consumer should create a wrapper class to hold the pointer address if so desired. In other words, users have full control (and responsibility) for managing the *pointer lifetime*. However, in certain cases we are able to convert Python objects for users (if *readonly, host* arrays are needed) so as to alleviate users' burden. For example, in functions that require a sequence or a nested sequence, the following operations are equivalent: .. code-block:: python # passing a host buffer of int type can be done like this buf = numpy.array([0, 1, 3, 5, 6], dtype=numpy.int32) my_func(..., buf.ctypes.data, ...) # or just this buf = [0, 1, 3, 5, 6] my_func(..., buf, ...) # the underlying data type is determined by the C API which is particularly useful when users need to pass a large number of tensor metadata to C (ex: :func:`cutensornet.create_network_descriptor`). .. _CUDA Python: https://nvidia.github.io/cuda-python/index.html User-provided memory pools -------------------------- Starting cuQuantum v22.03, we offer an interface for users to bring in their memory pool for the cuStateVec/cuTensorNet libraries to use. Once set, users are no longer required to manage any temporary workspace before calling an API; the library will draw memory from the user's pool (and return it back once done). The only requirement for the memory pool is it must be *stream-ordered*. See :ref:`Memory Management API ` for an introduction. Currently we only support *device* mempools. In cuQuantum Python, this interface is exposed with low-level APIs :func:`custatevec.set_device_mem_handler` and :func:`custatevec.get_device_mem_handler` (likewise for :mod:`~cuquantum.cutensornet`). Currently we offer three different ways to set the ``handler`` argument: - if an :class:`int` is given, it is assumed to be a pointer address to a fully initialized `custatevecDeviceMemHandler_t` struct - if a Python sequence of length 4, it is assumed to be ``(ctx, device_alloc, device_free, name)`` - if a Python sequence of length 3, it is assumed to be ``(malloc, free, name)`` see the API reference for further detail. Once set, using the calling convention - setting the workspace (or workspace descriptor) pointer address to ``0`` - setting the workspace size to ``0`` wherever an API needs a workspace will notify the library that it should use the user mempool. `This example `_ demonstrates the usage of this API. Usage example ============= The code below is a Python translation of the :ref:`corresponding cuStateVec example written in C `. .. testcode:: import numpy as np import cupy as cp from cuquantum.bindings import custatevec as cusv from cuquantum import cudaDataType as cudtype from cuquantum import ComputeType as ctype nIndexBits = 3 nSvSize = (1 << nIndexBits) nTargets = 1 nControls = 2 adjoint = 0 targets = (2,) controls = (0, 1) d_sv = cp.asarray([[0.0, 0.0], [0.0, 0.1], [0.1, 0.1], [0.1, 0.2], [0.2, 0.2], [0.3, 0.3], [0.3, 0.4], [0.4, 0.5]], dtype=np.float64) d_sv = d_sv.view(np.complex128).reshape(-1) d_sv_result = cp.asarray([[0.0, 0.0], [0.0, 0.1], [0.1, 0.1], [0.4, 0.5], [0.2, 0.2], [0.3, 0.3], [0.3, 0.4], [0.1, 0.2]], dtype=np.float64) d_sv_result = d_sv_result.view(np.complex128).reshape(-1) d_matrix = cp.asarray([[0.0, 0.0], [1.0, 0.0], [1.0, 0.0], [0.0, 0.0]], dtype=np.float64) d_matrix = d_matrix.view(np.complex128).reshape(-1) # cuStateVec handle initialization handle = cusv.create() # check the size of external workspace extraWorkspaceSizeInBytes = cusv.apply_matrix_get_workspace_size( handle, cudtype.CUDA_C_64F, nIndexBits, d_matrix.data.ptr, cudtype.CUDA_C_64F, cusv.MatrixLayout.ROW, adjoint, nTargets, nControls, ctype.COMPUTE_64F) # allocate external workspace if necessary if extraWorkspaceSizeInBytes > 0: workspace = cp.cuda.alloc(extraWorkspaceSizeInBytes) workspace_ptr = workspace.ptr else: workspace_ptr = 0 # apply gate cusv.apply_matrix( handle, d_sv.data.ptr, cudtype.CUDA_C_64F, nIndexBits, d_matrix.data.ptr, cudtype.CUDA_C_64F, cusv.MatrixLayout.ROW, adjoint, targets, len(targets), controls, 0, len(controls), ctype.COMPUTE_64F, workspace_ptr, extraWorkspaceSizeInBytes) # destroy handle cusv.destroy(handle) # -------------------------------------------------------------------------- # check if d_sv holds the updated statevector correct = cp.allclose(d_sv, d_sv_result) if not correct: raise RuntimeError("example FAILED: wrong result") # if this is a standalone script, everything is cleaned up properly at exit API reference ============= This reference describes all cuquantum-python's math primitives. .. module:: cuquantum.bindings .. toctree:: :maxdepth: 2 :includehidden: bindings/cudensitymat bindings/custatevec bindings/cutensornet