****************
cuSOLVERMp C API
****************

==================
Library Management
==================

.. _cusolverMpCreate-label:

------------------------
:code:`cusolverMpCreate`
------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpCreate(
            cusolverMpHandle_t *handle,
            int device,
            cudaStream_t stream)

| The function initializes the cuSOLVERMp library handle (:ref:`cusolverMpHandle_t <cusolverMpHandle_t-label>`) which holds the cuSOLVERMp library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSOLVERMp library calls.
| Calling any cuSOLVERMp function which uses :ref:`cusolverMpHandle_t <cusolverMpHandle_t-label>` without a previous call of :ref:`cusolverMpCreate() <cusolverMpCreate-label>` will return an error.
| The cuSOLVERMp library context is tied to the CUDA device provided by ``device`` and the CUDA stream ``stream``.
| Only one handle per process and per GPU supported. Sharing a device with multiple processes will result in undefined behavior. 

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"handle",   "Host", "Out", "cuSOLVERMp library handle"
	"device", "Host", "In",  "Device that will be assigned to the handle."
	"stream", "Host", "In",  "Stream that will be assigned to the handle."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpDestroy-label:

-------------------------
:code:`cusolverMpDestroy`
-------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpDestroy(
            cusolverMpHandle_t handle)

| The function destroy the cuSOLVERMp library handle (:ref:`cusolverMpHandle_t <cusolverMpHandle_t-label>`) which holds the cuSOLVERMp library context.
| The cuSOLVERMp library context is tied to the CUDA device provided by ``device``. Only one handle per process and per GPU supported.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"handle", "Host", "In/Out", "cuSOLVERMp library handle"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpGetStream-label:

---------------------------
:code:`cusolverMpGetStream`
---------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetStream(
            cusolverMpHandle_t handle,
            cudaStream_t *stream)

| The function returns the `stream` associated to the handle.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"handle", "Host", "In", "cuSOLVERMp library handle"
	"stream", "Host", "Out", "Stream associated with the handle."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpGetVersion-label:

----------------------------
:code:`cusolverMpGetVersion`
----------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetVersion(
            cusolverMpHandle_t handle,
            int *version)

| This function returns the version number of the cuSOLVERMp library.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"handle", "Host", "In", "cuSOLVERMp library handle"
	"version", "Host", "Out", "cuSOLVERMp library version."


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

===============
Grid Management
===============

.. _cusolverMpCreateDeviceGrid-label:

----------------------------------
:code:`cusolverMpCreateDeviceGrid`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpCreateDeviceGrid(
            cusolverMpHandle_t handle,
            cusolverMpGrid_t **grid,
            cal_comm_t comm,
            int32_t numRowDevices,
            int32_t numColDevices,
            cusolverMpGridMapping_t mapping)

| The function initializes the :ref:`grid <cusolverMpGrid_t-label>` opaque data structure. It maps the given resources (communicator, grid dimensions and grid layout) to a grid object.

| All the processes defined to be in this grid must enter this function.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"handle", "Host", "In", "cuSOLVERMp library handle"
	"grid", "Host", "Out", "Grid object to be initialized."
	"comm", "Host", "In", "Communicator that will be associated with the grid."
	"numRowDevices", "Host", "In", "How many of process rows the grid will contain."
	"numColDevices", "Host", "In", "How many of process columns the grid will contain."
	"mapping", "Host", "In", "How to map processes to the grid. See description of :ref:`cusolverMpGrid_t <cusolverMpGrid_t-label>` for further details. Currently, only `CUSOLVERMP_GRID_MAPPING_COL_MAJOR` is supported"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpDestroyGrid-label:

----------------------------------
:code:`cusolverMpDestroyGrid`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpDestroyGrid(
            cusolverMpGrid_t grid)

| The function destroys the given `grid` object.

| All the processes defined to be in this grid must enter this function.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
	
	"grid", "Host", "In/Out", "Grid object to be destroyed."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

=================
Matrix Management
=================

.. _cusolverMpCreateMatrixDesc-label:

----------------------------------
:code:`cusolverMpCreateMatrixDesc`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpCreateMatrixDesc(
            cudaLibMpMatrixDesc_t *descr,
            cusolverMpGrid_t grid,
            cudaDataType dataType,
            int64_t M_A,
            int64_t N_A,
            int64_t MB_A,
            int64_t NB_A,
            uint32_t RSRC_A,
            uint32_t CSRC_A,
            int64_t LLD_A)


| The function initializes :ref:`cudaLibMpMatrixDesc_t <cudaLibMpMatrixDesc_t-label>` object.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"descr",    "Host", "Out",    "Matrix descriptor object initialized by this function."
	"dataType", "Host", "In",     "Data type of the matrix A."
	"M_A",      "Host", "In",     "Number of rows in the global array A."
	"N_A",      "Host", "In",     "Number of columns in the global matrix A."
	"MB_A",     "Host", "In",     "Blocking factor used to distribute the rows of the global matrix A."
	"NB_A",     "Host", "In",     "Blocking factor used to distribute the columns of the global matrix A."
	"RSRC_A",   "Host", "In",     "Process row over which the first row of the matrix A is distributed. Only the value of `0` is currently supported."
	"CSRC_A",   "Host", "In",     "Process column over which the first row of the matrix A is distributed. Only the value of `0` is currently supported."
	"LLD_A",    "Host", "In",     "Leading dimension of the local matrix."


| Supported values for `dataType` argument are listed.

.. csv-table::
   :header: "Data Type of A", "Description"
   :widths: auto

	"CUDA_R_32I",        "32-bit integer values."
	"CUDA_R_64I",        "64-bit integer values."  
	"CUDA_R_32F",        "Single precision real values."
	"CUDA_R_64F",        "Double precision real values."
	"CUDA_C_32F",        "Single precision complex values."
	"CUDA_C_64F",        "Double precision complex values."


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpDestroyMatrixDesc-label:

-----------------------------------
:code:`cusolverMpDestroyMatrixDesc`
-----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpDestroyMatrixDesc(
            cudaLibMpMatrixDesc_t descr )


| The function destroys :ref:`cudaLibMpMatrixDesc_t <cudaLibMpMatrixDesc_t-label>` object.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"descr", "Host", "In/Out", "Matrix descriptor object destroyed by this function."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

=======
Utility
=======

.. _cusolverMpNUMROC-label:

------------------------
:code:`cusolverMpNUMROC`
------------------------

.. code-block:: cpp

    int64_t cusolverMpNUMROC(
            int64_t n,
            int64_t nb,
            uint32_t iproc,
            uint32_t isrcproc,
            uint32_t nprocs) 

| Computes the number of rows or columns of a distributed matrix owned by the process indicated by `iproc` argument.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"n",        "Host", "In", "Number of rows or columns in the global distributed matrix."
	"nb",       "Host", "In", "Row or column blocking size of the global matrix."
	"iproc",    "Host", "In", "The coordinate of the process whole local array row or column is to be determined."
	"isrcproc", "Host", "In", "The coordinate of the process that owns the first row or column of the distributed matrix."
	"nprocs",   "Host", "In", "The total number of row or column processes over which the matrix is distributed."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpMatrixGatherD2H-label:

---------------------------------
:code:`cusolverMpMatrixGatherD2H`
---------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpMatrixGatherD2H(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            int root,
            void *h_dst,
            int64_t h_lddst)

| Gathers the global distributed matrix `A` on a buffer provided on process `root`. The input matrix `A` is originally distributed using 2D block cyclic format, on output `h_dst` contains the matrix in column-major format.

| Notice that, for this function, the input data is on the device and the output is stored on host memory.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",  "Host",   "In",     "cuSOLVERMp library handle."
	"M",       "Host",   "In",     "Number of rows of the global distributed matrix A."
	"M",       "Host",   "In",     "Number of columns of the global distributed matrix A."
	"d_A",     "Device", "In",     "Number of columns of the global distributed matrix A."
	"d_A",     "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the M-by-N distributed matrix sub(A)."
	"IA",      "Host",   "In",     "Row index in the global matrix A indicating the first row of sub(A). This function does not make any assuptions on the alignment of `IA`."
	"JA",      "Host",   "In",     "Column index in the global matrix A indicating the first column of sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",  "Host",   "In",     "Matrix descriptor of the global matrix A."
	"root",    "Host",   "In",     "Process ID on which the matrix A will be gathered."
	"h_dst",   "Host",   "Out",    "Destination host buffer on `root` process. On output it contains the global matrix A stored in column-major format. Total size must be at least `M*N` words."
	"h_lddst", "Host",   "In",     "Leading dimension of the `h_dst` on `root` process. Must be larger than `M`."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is function is meant as an utility function to verify correctness of the data layouts and it is not intended to achieve high performance on large inputs.

----

.. _cusolverMpMatrixScatterH2D-label:

----------------------------------
:code:`cusolverMpMatrixScatterH2D`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpMatrixScatterH2D(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            int root,
            void *h_src,
            int64_t h_ldsrc)

| Scatters the matrix stored in the local buffer `h_src` from `root` process to a distributed global matrix `A`.

| The input matrix `h_src` is stored in column-major format. On ouput, `d_A` contains the local portions of the global matrix `A` distributed in 2D block cyclic format.

| Notice that, for this function, the input data is on the host and the output is stored on device memory.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",  "Host",   "In",     "cuSOLVERMp library handle."
	"M",       "Host",   "In",     "Number of rows of the global distributed matrix A."
	"M",       "Host",   "In",     "Number of columns of the global distributed matrix A."
	"d_A",     "Device", "Out",    "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On output, this array contains the local pieces of the M-by-N distributed matrix sub(A)."
	"IA",      "Host",   "In",     "Row index in the global matrix A indicating the first row of sub(A). This function does not make any assuptions on the alignment of `IA`."
	"JA",      "Host",   "In",     "Column index in the global matrix A indicating the first column of sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",  "Host",   "In",     "Matrix descriptor of the global matrix A."
	"root",    "Host",   "In",     "Blocking factor used to distribute the columns of the global matrix A."
	"h_src",   "Host",   "In",     "Source buffer on `root` process. On input it contains the global `M` by `N` matrix A stored in column-major format."
	"h_ldsrc", "Host",   "In",     "Leading dimension of the `h_dst` on `root` process. Must be larger than `M`."


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is function is meant as an utility function to verify correctness of the data layouts and it is not intended to achieve high performance on large inputs.

----

.. _cusolverMpLoggingAPI-label:

=======
Logging
=======


.. _cusolverMpLoggerSetCallback-label:

-----------------------------------
:code:`cusolverMpLoggerSetCallback`
-----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerSetCallback(
            cusolverMpLoggerCallback_t callback)

| This function sets the logging callback function.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"callback", "Host", "In", "Pointer to a callback function. See `cusolverMpLoggerCallback_t`."

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----

.. _cusolverMpLoggerSetFile-label:

-------------------------------
:code:`cusolverMpLoggerSetFile`
-------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerSetFile(
            FILE *file)


| This function sets the logging output file. Note: once registered using this function call, the provided file handle must not be closed unless the function is called again to switch to a different file handle.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"file", "Host", "In", "Pointer to an open file. File should have write permission" 

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----

.. _cusolverMpLoggerOpenFile-label:

--------------------------------
:code:`cusolverMpLoggerOpenFile`
--------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerOpenFile(
            const char* logFile)


| This function opens a logging output file in the given path.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"logFile", "Host", "In", "Path of the logging output file." 


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----

.. _cusolverMpLoggerSetLevel-label:

--------------------------------
:code:`cusolverMpLoggerSetLevel`
--------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerSetLevel(
            int level)

| Complete

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"level", "Host", "In", "Value of the logging level. See `cuSOLVERMp Logging`." 

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----


.. _cusolverMpLoggerSetMask-label:

--------------------------------
:code:`cusolverMpLoggerSetMask`
--------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerSetMask(
            int mask)

| This function sets the value of the logging mask.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"mask", "Host", "In", "Value of the logging mask. See `cuSOLVERMp Logging`." 

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----

.. _cusolverMpLoggerForceDisable-label:

------------------------------------
:code:`cusolverMpLoggerForceDisable`
------------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpLoggerForceDisable(
            int level)

| This function disables logging for the entier run.

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

.. warning :: This is an experimental feature.

----


=========================
Dense Linear Algebra APIs
=========================

.. _cusolverMpGetrf-label:

----------------------------------
:code:`cusolverMpGetrf`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetrf(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            int64_t  *d_ipiv,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| This routine computes an LU factorization of a general M-by-N distributed matrix sub(A) using partial pivoting.
 The user can also disable pivoting by setting `d_ipiv=NULL`.
   
| The factorization has the form:

.. math ::
    sub(A) = P \cdot L \cdot U
    
| where :math:`P` is a permutation matrix, :math:`L` is lower triangular with unit diagonal elements (lower trapezoidal if :math:`m > n`), and :math:`U`
 is upper triangular (upper trapezoidal if :math:`m < n`). :math:`L` and :math:`U` are stored in sub(A).
   
| The user can combine :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>` and :ref:`cusolverMpGetrs() <cusolverMpGetrs-label>` to solve a system of linear equations.


.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"d_A",                      "Device", "In/Out", "Pointer to the first entry of the local portion of the global matrix A. On output, the sub(A) is overwritten with the L and U factors."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_ipiv",                   "Device", "Out",    "Local array of dimension `(LOCr(M_A)+MB_A)`. If the user set `d_ipiv != NULL`, on output, this array contains the pivoting information. `d_ipiv[i]` indicates the global row local row i was swapped with. This array is tied to the distributed matrix A."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Host",   "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpGetrf_bufferSize() <cusolverMpGetrf_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpGetrf_bufferSize() <cusolverMpGetrf_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function. `info > 0` indicates the index of the leading minor in the case of a singular matrix."


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpGetrf_bufferSize-label:

----------------------------------
:code:`cusolverMpGetrf_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetrf_bufferSize(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            int64_t *d_ipiv,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`.

| The user can set `d_ipiv=NULL` so :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>` will compute the LU factorization of the input matrix A without pivoting.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer to the first entry of the local portion of the global matrix A."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_ipiv",                   "Device", "In",     "Indicates a pointer to a distributed integer array. When it is not `null`, workspace for pivoting is accounted."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "On output, contains the size in bytes of the local device workspace needed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "On output, contains the size in bytes of the local host workspace needed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpGetrs-label:

----------------------------------
:code:`cusolverMpGetrs`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetrs(
            cusolverMpHandle_t handle,
            cublasOperation_t trans,
            int64_t N,
            int64_t NRHS,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            const int64_t *d_ipiv,
            void *d_B,
            int64_t IB,
            int64_t JB,
            cudaLibMpMatrixDesc_t descrB,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *d_info)

| This routine solves a system of distributed linear equations

.. math::

   op(sub(A)) \cdot X = sub(B)

| with a general N-by-N distributed matrix sub(A) using the LU factorization computed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`.

| Where :math:`op` is defined by the argument `trans`, which allows to solve linear systems of the form:

.. csv-table::
   :header: "trans", "Form of the linear system"
   :widths: auto
	
	"CUBLAS_OP_N", ":math:`sub(A)  \cdot X = sub(B)`"
	"CUBLAS_OP_T", ":math:`sub(A)^T \cdot X = sub(B)`"
	"CUBLAS_OP_C", ":math:`sub(A)^H \cdot X = sub(B)`"

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"trans",                    "Host",   "In",     "Specifies the form of the linear system. Only `CUBLAS_OP_N` is currently supported."
	"N",                        "Host",   "In",     "Number of rows of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of colums of sub(B). Currently, this routine only supports `NRHS=1`."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the M-by-N distributed L and U factors of sub(A) as computed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_ipiv",                   "Device", "In",     "Local array of dimension `(LOCr(M_A)+MB_A)` containing the pivoting information as computed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."
	"d_B",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_B,LOCc(JB+NRHS-1))`. On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of `IB`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of `JB`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpGetrs_bufferSize() <cusolverMpGetrs_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpGetrs_bufferSize() <cusolverMpGetrs_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function. `info > 0` indicates the index of the leading minor in the case of a singular matrix."


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpGetrs_bufferSize-label:

----------------------------------
:code:`cusolverMpGetrs_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGetrs_bufferSize(
            cusolverMpHandle_t handle,
            cublasOperation_t trans,
            int64_t N,
            int64_t NRHS,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            const int64_t *d_ipiv,
            void *d_B,
            int64_t IB,
            int64_t JB,
            cudaLibMpMatrixDesc_t descrB,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpGetrs() <cusolverMpGetrs-label>`.

| If pivoting was disabled during :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`, the user must set `d_ipiv=NULL`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"trans",                    "Host",   "In",     "Specifies the form of the linear system. Only `CUBLAS_OP_N` is currently supported."
	"N",                        "Host",   "In",     "Number of rows of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of colums of sub(B). Currently, this routine only supports `NRHS=1`."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the M-by-N distributed L and U factors of sub(A) as computed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_ipiv",                   "Device", "In",     "Local array of dimension `(LOCr(M_A)+MB_A)` containing the pivoting information as computed by :ref:`cusolverMpGetrf() <cusolverMpGetrf-label>`."
	"d_B",                      "Device", "In",     "Pointer to the first entry of the local portion of the global matrix B. On output, B is overwritten the solution of the linear system."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of `IB`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of `JB`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "On output, contains the size in bytes of the local device workspace needed by :ref:`cusolverMpGetrs() <cusolverMpGetrs-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "On output, contains the size in bytes of the local host workspace needed by :ref:`cusolverMpGetrs() <cusolverMpGetrs-label>`."


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpPotrf-label:

----------------------------------
:code:`cusolverMpPotrf`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpPotrf(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Computes the Cholesky factorization of an N-by-N real symmetric or a complex hermitian positive definite distributed matrix sub(A) denoting `A(IA:IA+N-1, JA:JA+N-1)`.

| If A is upper triangular and `uplo=CUBLAS_FILL_MODE_UPPER`, the factorization has the form

.. math ::
    sub(A) = U^H \cdot U

| where U is upper triangular.

| If the matrix is lower triangular and `uplo` is set to `CUBLAS_FILL_MODE_LOWER`, the factorization has the form
    
.. math ::
          sub(A) = L \cdot L^H

| where L is lower triangular.

| The user can combine :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>` and :ref:`cusolverMpPotrs() <cusolverMpPotrs-label>` to solve a system of linear equations.


.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"uplo",                     "Host",   "In",     "Specifies if A is upper (`CUBLAS_FILL_MODE_UPPER`) or lower triangular matrix (`CUBLAS_FILL_MODE_LOWER`)."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the N-by-N distributed matrix sub(A). On output, this array contains the L or U factors of A, depending on the value of `uplo`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpPotrf_bufferSize() <cusolverMpPotrf_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpPotrf_bufferSize() <cusolverMpPotrf_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function. `info > 0` indicates the index of the leading minor in the case of a singular matrix."

| This function requires square block size `(MB_A == NB_A)`.


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpPotrf_bufferSize-label:

----------------------------------
:code:`cusolverMpPotrf_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpPotrf_bufferSize(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t N,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"uplo",                     "Host",   "In",     "Specifies if A is upper (`CUBLAS_FILL_MODE_UPPER`) or lower triangular matrix (`CUBLAS_FILL_MODE_LOWER`)."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the N-by-N distributed matrix sub(A). On output, this array contains the L or U factors of A, depending on the value of `uplo`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "On output, contains the size in bytes of the local device workspace needed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "On output, contains the size in bytes of the local host workspace needed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`."

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpPotrs-label:

----------------------------------
:code:`cusolverMpPotrs`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpPotrs(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t N,
            int64_t NRHS,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            void *d_B,
            int64_t IB,
            int64_t JB,
            cudaLibMpMatrixDesc_t descB,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Solves a system of linear equations

.. math ::
    sub(A) \cdot X = sub(B)

| where sub(A) denotes `A(IA:IA+N-1,JA:JA+N-1)` and is a N-by-N symmetric or hermitian positive definite distributed matrix using the Cholesky factorization:
 
 .. math ::
    sub(A) = U^H \cdot U
 
| or

.. math ::
    sub(B) = L \cdot L^H

| computed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>` and sub(B) denotes the distributed matrix `B(IB:IB+N-1,JB:JB+NRHS-1)`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"uplo",                     "Host",   "In",     "Specifies if A is upper (`CUBLAS_FILL_MODE_UPPER`) or lower triangular matrix (`CUBLAS_FILL_MODE_LOWER`)."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of colums of sub(B). Currently, this routine only supports `NRHS=1`."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. Contains the local pieces of the N-by-N distributed L or U factors of sub(A) as computed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `NB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `MB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_B",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_B,LOCc(JB+NRHS-1))`. On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of `IB`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of `JB`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpPotrs_bufferSize() <cusolverMpPotrs_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpPotrs_bufferSize() <cusolverMpPotrs_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function. `info > 0` indicates the index of the leading minor in the case of a singular matrix."


| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(B) matrices, meaning `(MB_A == MB_B)` and `(IA == IB)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

---

.. _cusolverMpPotrs_bufferSize-label:

----------------------------------
:code:`cusolverMpPotrs_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpPotrs_bufferSize(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t n,
            int64_t nrhs,
            const void *a,
            int64_t ia,
            int64_t ja,
            cudaLibMpMatrixDesc_t descrA,
            const void *b,
            int64_t ib,
            int64_t jb,
            cudaLibMpMatrixDesc_t descB,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpPotrs() <cusolverMpPotrs-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"uplo",                     "Host",   "In",     "Specifies if A is upper (`CUBLAS_FILL_MODE_UPPER`) or lower triangular matrix (`CUBLAS_FILL_MODE_LOWER`)."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of colums of sub(B). Currently, this routine only supports `NRHS=1`."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. Contains the local pieces of the N-by-N distributed L or U factors of sub(A) as computed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `NB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `MB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"d_B",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_B,LOCc(JB+NRHS-1))`. On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of `IB`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of `JB`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "On output, contains the size in bytes of the local device workspace needed by :ref:`cusolverMpPotrs() <cusolverMpPotrs-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "On output, contains the size in bytes of the local host workspace needed by :ref:`cusolverMpPotrs() <cusolverMpPotrs-label>`."

| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(B) matrices, meaning `(MB_A == MB_B)` and `(IA == IB)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

---


.. _cusolverMpGeqrf-label:

----------------------------------
:code:`cusolverMpGeqrf`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGeqrf(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            void *d_tau,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Computes the QR factorization of a distributed M-by-N matrix sub(A) denoting `A(IA:IA+M-1, JA:JA+N-1)`.

.. math ::
    sub(A) = Q \cdot R

| where Q is an orthogonal matrix represented by a product of Householder reflectors with the array of `tau` and R is upper triangular matrix.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"d_A",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, this array contains the local pieces of the M-by-N distributed matrix sub(A). On output, this array contains the R factors of A and Householder reflectors below of diagonals with `tau` vector."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_tau",                    "Device", "Out",    "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpGeqrf_bufferSize() <cusolverMpGeqrf_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpGeqrf_bufferSize() <cusolverMpGeqrf_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires square block size `(MB_A == NB_A)`.


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpGeqrf_bufferSize-label:

----------------------------------
:code:`cusolverMpGeqrf_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGeqrf_bufferSize(
            cusolverMpHandle_t handle,
            int64_t M,
            int64_t N,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            cudaLibMpMatrixDesc_t descrA,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`."
	"IA",                       "Host",   "In",     "Row index in the global matrix A indicating the first row of sub(A). This function does not make any assuptions on the alignment of `IA`."
	"JA",                       "Host",   "In",     "Column index in the global matrix A indicating the first column of sub(A). This function does not make any assumptions on the alignment of `JA`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A"
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "On output, contains the size in bytes of the local device workspace needed by :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "On output, contains the size in bytes of the local host workspace needed by :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`."

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

---


.. _cusolverMpOrmqr-label:

----------------------------------
:code:`cusolverMpOrmqr`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpOrmqr(
            cusolverMpHandle_t handle,
            cublasSideMode_t side,
            cublasOperation_t trans,
            int64_t M,
            int64_t N,
            int64_t K,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            const void *d_tau,
            void *d_C,
            int64_t IC,
            int64_t JC,
            const cudaLibMpMatrixDesc_t descC,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Multiply distributed M-by-N matrix sub(C) denoting `C(IC:IC+M-1, JC:JC+N-1)` by the orthogonal matrix Q can be given from :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`.
| The function can perform the following matrix product and overwrite the result on sub(C).  

.. math ::
    sub(C) = op(Q) \cdot sub(C) \\
    sub(C) = sub(C) \cdot op(Q)
    
| for the `side` of `CUBLAS_SIDE_LEFT` and `CUBLAS_SIDE_RIGHT` respectively. Note that the current implementation only support for `CUBLAS_SIDE_LEFT`.

| Q is a orthogonal matrix formed as the product of Householder reflectors returned from :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`. 

.. math ::
    Q = H(1) H(2) ... H(K)

| The number of the Householder reflectors is constrained by `K <= M` and `K <= N` for `CUBLAS_SIDE_LEFT` and `CUBLAS_SIDE_RIGHT` respectively.

| The `op` can be translated to :math:`Q`, :math:`Q^T`, :math:`Q^H` based on the `trans` argument `CUBLAS_OP_N`, `CUBLAS_OP_T` and `CUBLAS_OP_H`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"side",                     "Host",   "In",     "Indicate that Q is applied from left or right side."
	"trans",                    "Host",   "In",     "Indicate that Q is applied with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(C)."
	"N",                        "Host",   "In",     "Number of columns of sub(C)."
	"K",                        "Host",   "In",     "Number of Householder reflectors defining Q."	
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains Householder reflectors below of diagonals with `tau` vector"
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_tau",                    "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
	"d_C",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_C, LOCc(JC+N-1))`. On entry, the array contains the local pieces of the M-by-N distributed matrix sub(C). On exit, the sub(C) is overwritten by op(Q)*sub(C) or sub(C)*op(Q)."
	"IC",                       "Host",   "In",     "Row index of the first row of the sub(C). `IC` must be a multiple of the row blocking dimension `MB_C`."
	"JC",                       "Host",   "In",     "Column index of the first column of the sub(C). `JC` must be a multiple of the column blocking dimension `NB_C`."
	"descrC",                   "Host",   "In",     "Matrix descriptor associated to the global matrix C."
	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpOrmqr_bufferSize() <cusolverMpOrmqr_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpOrmqr_bufferSize() <cusolverMpOrmqr_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(C) matrices, meaning `(MB_A == MB_C)` and `(IA == IC)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpOrmqr_bufferSize-label:

----------------------------------
:code:`cusolverMpOrmqr_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpOrmqr_bufferSize(
            cusolverMpHandle_t handle,
            cublasSideMode_t side,
            cublasOperation_t trans,
            int64_t M,
            int64_t N,
            int64_t K,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            const void *d_tau,
            void *d_C,
            int64_t IC,
            int64_t JC,
            const cudaLibMpMatrixDesc_t descrC,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpOrmqr() <cusolverMpOrmqr-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"side",                     "Host",   "In",     "Indicate that Q is applied from left or right side."
	"trans",                    "Host",   "In",     "Indicate that Q is applied with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(C)."
	"N",                        "Host",   "In",     "Number of columns of sub(C)."
	"K",                        "Host",   "In",     "Number of Householder reflectors defining Q."	
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains Householder reflectors below of diagonals with `tau` vector"
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_tau",                    "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
	"d_C",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_C, LOCc(JC+N-1))`. On entry, the array contains the local pieces of the M-by-N distributed matrix sub(C). On exit, the sub(C) is overwritten by op(Q)*sub(C) or sub(C)*op(Q)."
	"IC",                       "Host",   "In",     "Row index of the first row of the sub(C). `IC` must be a multiple of the row blocking dimension `MB_C`."
	"JC",                       "Host",   "In",     "Column index of the first column of the sub(C). `JC` must be a multiple of the column blocking dimension `NB_C`."
	"descrC",                   "Host",   "In",     "Matrix descriptor associated to the global matrix C."
	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",     "The size in bytes of the local device workspace needed by :ref:`cusolverMpOrmqr() <cusolverMpOrmqr-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",     "The size in bytes of the local host workspace needed by :ref:`cusolverMpOrmqr() <cusolverMpOrmqr-label>`"

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpGels-label:

----------------------------------
:code:`cusolverMpGels`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGels(
            cusolverMpHandle_t handle,
            cublasOperation_t trans,            
            int64_t M,
            int64_t N,
            int64_t NRHS,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_B,
            int64_t IB,
            int64_t JB,
            const cudaLibMpMatrixDesc_t descrB,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Solves overdetermined or underdetermined linear systems involving a distributed M-by-N matrix sub(A) denoting `A(IA:IA+M-1, JA:JA+N-1)` or its transpose, using QR or LQ factorization of sub(A).
| Note that the solution of overdetermined systems (`M >= N`) with a no-transpose option is only supported via QR factorization :ref:`cusolverMpGeqrf() <cusolverMpGeqrf-label>`. 

.. math ::
    X \leftarrow \mbox{argmin} | sub(B) - sub(A) \cdot X |

| where sub(B) is a distributed M-by-NRHS multi-vector denoting `B(IB:IB+M-1, JB:JB+NRHS-1)` and the solution multi-vector `X` is overwritten on the sub(B).

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"trans",                    "Host",   "In",     "Indicate that the linear system of sub(A) involves with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of right hand side vectors i.e., number of columns of sub(B) and X."	
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_B",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_B, LOCc(JB+NRHS-1))`. On entry, the array contains the local pieces of the M-by-NRHS distributed matrix sub(B). On exit, the sub(B) is overwritten by the solution of the solution vectors."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpGels_bufferSize() <cusolverMpGels_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpGels_bufferSize() <cusolverMpGels_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(B) matrices, meaning `(MB_A == MB_B)` and `(IA == IB)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpGels_bufferSize-label:

----------------------------------
:code:`cusolverMpGels_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpGels_bufferSize(
            cusolverMpHandle_t handle,
            cublasOperation_t trans,            
            int64_t M,
            int64_t N,
            int64_t NRHS,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_B,
            int64_t IB,
            int64_t JB,
            const cudaLibMpMatrixDesc_t descrB,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpGels() <cusolverMpGels-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"trans",                    "Host",   "In",     "Indicate that the linear system of sub(A) involves with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(A)."
	"N",                        "Host",   "In",     "Number of columns of sub(A)."
	"NRHS",                     "Host",   "In",     "Number of right hand side vectors i.e., number of columns of sub(B) and X."	
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_B",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_B, LOCc(JB+NRHS-1))`."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpGels() <cusolverMpGels-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpGels() <cusolverMpGels-label>`"


| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpSytrd-label:

----------------------------------
:code:`cusolverMpSytrd`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSytrd(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_d,
            void *d_e,
            void *d_tau,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)


| Reduces a symmetric (or hermitian for a complex value type) distributed N-by-N matrix sub(A) denoting `A(IA:IA+N-1, JA:JA+N-1)` to a tridiagonal form.

.. math ::
    A \rightarrow Q \cdot T \cdot Q^H

| Currently, the function is implemented for `CUBLAS_FILL_MODE_LOWER` only.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

  "handle",                   "Host",   "In",     "cuSOLVERMp library handle."  
  "uplo",                     "Host",   "In",     "Indicate that the function uses either upper or lower triangular part of sub(A)."
  "N",                        "Host",   "In",     "Number of rows/columns of square matrix sub(A)."
  "d_A",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. On entry, the array contains the local part of symmetric distributed matrix sub(A). On exit, if the `CUBLAS_FILL_MODE_UPPER` is set, the diagonal and first superdiagonal of the tridiagonal of sub(A) is overwritten by the corresponding tridiagonal matrix, and Householder reflectors are stored above the superdiagonal of sub(A). If `CUBLAS_FILL_MODE_LOWER` is set, the diagonal and first subdiagonal of sub(A) is overwritten by the corresponding tridiagonal matrix, and Householder reflectors are stored below the subdiagonal of sub(A)."
  "IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
  "JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
  "descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
  "d_d",                      "Device", "Out",    "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. The diagonal elements of tridiagonal matrix is stored: `d(i) = A(i,i)`."
  "d_e",                      "Device", "Out",    "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. The off-diagonal elements of tridiagonal matrix is stored: `e(i) = A(i,i+1)` for `CUBLAS_FILL_MODE_UPPER` and `e(i) = A(i+1,i)` for `CUBLAS_FILL_MODE_LOWER`."
  "d_tau",                    "Device", "Out",    "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
  "computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
  "d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
  "workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpSytrd_bufferSize() <cusolverMpSytrd_bufferSize-label>`."
  "h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
  "workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpSytrd_bufferSize() <cusolverMpSytrd_bufferSize-label>`."
  "info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires a square block size for A, `(MB_A == NB_A)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpSytrd_bufferSize-label:

----------------------------------
:code:`cusolverMpSytrd_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSytrd_bufferSize(
            cusolverMpHandle_t handle,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_d,
            void *d_e,
            void *d_tau,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpSytrd() <cusolverMpSytrd-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"uplo",                     "Host",   "In",     "Indicate that the function uses either upper or lower triangular part of sub(A)."
	"N",                        "Host",   "In",     "Number of rows/columns of square matrix sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_d",                      "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`."
	"d_e",                      "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`."
	"d_tau",                    "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpSytrd() <cusolverMpSytrd-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpSytrd() <cusolverMpSytrd-label>`"

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpStedc-label:

----------------------------------
:code:`cusolverMpStedc`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpStedc(
            cusolverMpHandle_t handle,
            char *compz,
            int64_t N,
            void *d_d,
            void *d_e,
            void *d_Q,
            int64_t IQ,
            int64_t JQ,
            const cudaLibMpMatrixDesc_t descrQ,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Computes all eigenvalues and eigenvectors of a symmetric tridiagonal matrix using the divice and conquer algorithm. 

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"compz",                    "Host",   "In",     "Option to compute eigenvalues only(`compz=N`) or both eigenvalues/vectors(`compz=I`). Currently, `I` is only implemented."
	"N",                        "Host",   "In",     "Number of rows/columns of square matrix sub(A)."
	"d_d",                      "Device", "In/Out", "Pointer to an array of dimension `N`. On entry, the array contains diagonal elements of the tridiagonal matrix. On exit, the eigenvalues are stored in descending order."
	"d_e",                      "Device", "In/Out", "Pointer to an array of dimension `N-1`. On entry, the array contains subdiagonal elements of the tridiagonal matrix. On exit, the content of the array is destroyed."
	"d_Q",                      "Device", "Out",    "Pointer into the local memory to an array of dimension `(LLD_Q, LOCc(JQ+N-1))`. On output, the array contains the local elements of orthonormal eigenvectors of the symmetric diagonal matrix."
	"IQ",                       "Host",   "In",     "Row index of the first row of the sub(Q). `IQ` must be a multiple of the row blocking dimension `MB_Q`."
	"JQ",                       "Host",   "In",     "Column index of the first column of the sub(A). `JQ` must be a multiple of the column blocking dimension `NB_Q`."
	"descrQ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Q."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpStedc_bufferSize() <cusolverMpStedc_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpStedc_bufferSize() <cusolverMpStedc_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires a square block size for Q, `(MB_Q == NB_Q)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of Tridiagonal Matrix", "computeType", "Output Data Type"
   :widths: auto

	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpStedc_bufferSize-label:

----------------------------------
:code:`cusolverMpStedc_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpStedc_bufferSize(
            cusolverMpHandle_t handle,
            char *compz,
            int64_t N,
            void *d_d,
            void *d_e,
            void *d_Q,
            int64_t IQ,
            int64_t JQ,
            const cudaLibMpMatrixDesc_t descrQ,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost,
            int *iwork)


| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpStedc() <cusolverMpStedc-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto
            
	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"compz",                    "Host",   "In",     "Option to compute eigenvalues only(`compz=N`) or both eigenvalues/vectors(`compz=I`). Currently, `I` is only implemented."
	"N",                        "Host",   "In",     "Number of rows/columns of square matrix sub(A)."
	"d_d",                      "Device", "In",     "Pointer to an array of dimension `N`."
	"d_e",                      "Device", "In",     "Pointer to an array of dimension `N-1`."
	"d_Q",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_Q, LOCc(JQ+N-1))`."
	"IQ",                       "Host",   "In",     "Row index of the first row of the sub(Q). `IQ` must be a multiple of the row blocking dimension `MB_Q`."
	"JQ",                       "Host",   "In",     "Column index of the first column of the sub(A). `JQ` must be a multiple of the column blocking dimension `NB_Q`."
	"descrQ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Q."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by the routine :ref:`cusolverMpStedc() <cusolverMpStedc-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpStedc() <cusolverMpStedc-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This routine currently supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of Tridiagonal Matrix", "computeType", "Output Data Type"
   :widths: auto
	

	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpOrmtr-label:

----------------------------------
:code:`cusolverMpOrmtr`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpOrmtr(
            cusolverMpHandle_t handle,
            cublasSideMode_t side,
            cublasFillMode_t uplo,
            cublasOperation_t trans,
            int64_t M,
            int64_t N,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            const void *d_tau,
            void *d_C,
            int64_t IC,
            int64_t JC,
            const cudaLibMpMatrixDesc_t descrC,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            voidB *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Multiply distributed M-by-N matrix sub(C) denoting `C(IC:IC+M-1, JC:JC+N-1)` by the orthogonal matrix Q can be given from :ref:`cusolverMpSytrd() <cusolverMpSytrd-label>`.
| The function can perform the following matrix product and overwrite the result on sub(C) for the side parameter `CUBLAS_SIDE_LEFT` and `CUBLAS_SIDE_RIGHT`.

.. math ::
    sub(C) = op(Q) \cdot sub(C) \\
    sub(C) = sub(C) \cdot op(Q) 

| The `op` can be translated to :math:`Q`, :math:`Q^T`, :math:`Q^H` based on the `trans` argument `CUBLAS_OP_N`, `CUBLAS_OP_T` and `CUBLAS_OP_H`.    
| Q is an orthogonal matrix formed as the product of Householder reflectors as follows for the uplo parameter `CUBLAS_FILL_MODE_UPPER` and `CUBLAS_FILL_MODE_LOWER`

.. math ::
    Q = H(1) H(2) ... H(nq-1) \\
    Q = H(nq-1) H(nq-2) ... H(1) 

| where `nq` is either `m` or `n` according to the side parameter of `CUBLAS_SIDE_LEFT` or `CUBLAS_SIDE_RIGHT` respectively.


.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"side",                     "Host",   "In",     "Indicate that Q is applied from left or right side."
	"uplo",                     "Host",   "In",     "Indicate that upper or lower triangular of sub(A) contains Householder reflectors."
	"trans",                    "Host",   "In",     "Indicate that Q is applied with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(C)."
	"N",                        "Host",   "In",     "Number of columns of sub(C)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains Householder reflectors below of diagonals with `tau` vector"
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_tau",                    "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
	"d_C",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_C, LOCc(JC+N-1))`. On entry, the array contains the local pieces of the M-by-N distributed matrix sub(C). On exit, the sub(C) is overwritten by op(Q)*sub(C) or sub(C)*op(Q)."
	"IC",                       "Host",   "In",     "Row index of the first row of the sub(C). `IC` must be a multiple of the row blocking dimension `MB_C`."
	"JC",                       "Host",   "In",     "Column index of the first column of the sub(C). `JC` must be a multiple of the column blocking dimension `NB_C`."
	"descrC",                   "Host",   "In",     "Matrix descriptor associated to the global matrix C."
	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpOrmtr_bufferSize() <cusolverMpOrmtr_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpOrmtr_bufferSize() <cusolverMpOrmtr_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(B) matrices, meaning `(MB_A == MB_C)` and `(IA == IC)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A and C", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpOrmtr_bufferSize-label:

----------------------------------
:code:`cusolverMpOrmtr_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpOrmtr_bufferSize(
            cusolverMpHandle_t handle,
            cublasSideMode_t side,
            cublasFillMode_t uplo,            
            cublasOperation_t trans,
            int64_t M,
            int64_t N,
            const void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            const void *d_tau,
            void *d_C,
            int64_t IC,
            int64_t JC,
            const cudaLibMpMatrixDesc_t descrC,
            cudaDataType_t computeType,
            size_t* workspaceInBytesOnDevice,
            size_t* workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpOrmtr() <cusolverMpOrmtr-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"side",                     "Host",   "In",     "Indicate that Q is applied from left or right side."
	"uplo",                     "Host",   "In",     "Indicate that upper or lower triangular of sub(A) contains Householder reflectors."  
	"trans",                    "Host",   "In",     "Indicate that Q is applied with no-transpose or (conj)transpose."
	"M",                        "Host",   "In",     "Number of rows of sub(C)."
	"N",                        "Host",   "In",     "Number of columns of sub(C)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains Householder reflectors below of diagonals with `tau` vector"
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_tau",                    "Device", "In",     "Pointer into the local memory to an array of dimension `LOCc(JA+N-1)`. This array contains scalar factors of the Householder reflectors"
	"d_C",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_C, LOCc(JC+N-1))`. On entry, the array contains the local pieces of the M-by-N distributed matrix sub(C). On exit, the sub(C) is overwritten by `op(Q)*sub(C)` or `sub(C)*op(Q)` based on the side input `CUBLAS_SIDE_LEFT` or `CUBLAS_SIDE_RIGHT` respectively."
	"IC",                       "Host",   "In",     "Row index of the first row of the sub(C). `IC` must be a multiple of the row blocking dimension `MB_C`."
	"JC",                       "Host",   "In",     "Column index of the first column of the sub(C).`JC` must be a multiple of the column blocking dimension `NB_C`."
	"descrC",                   "Host",   "In",     "Matrix descriptor associated to the global matrix C."
	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",     "The size in bytes of the local device workspace needed by :ref:`cusolverMpOrmtr() <cusolverMpOrmtr-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",     "The size in bytes of the local host workspace needed by :ref:`cusolverMpOrmtr() <cusolverMpOrmtr-label>`"

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpSyevd-label:

----------------------------------
:code:`cusolverMpSyevd`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSyevd(
            cusolverMpHandle_t handle,
            char *jobz,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_d,
            void *d_Z,
            int64_t IZ,
            int64_t JZ,
            const cudaLibMpMatrixDesc_t descrZ,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Computes all eigenvalues and optionally eigenvectors of a symmetric distributed N-by-N matrix sub(A) `A(IA:IA+N-1, JA:JA+N-1)` using the divide and conquer algorithm :ref:`cusolerMpStec() <cusolverMpStedc-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"jobz",                     "Host",   "In",     "If `jobz = N`, then eigenvalues are computed and if `jobz = V`, then eigenvalues and eigenvectors are computed."
	"uplo",                     "Host",   "In",     "Indicate that upper or lower triangular of sub(A) is used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains local parts of the symmetric matrix A."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_d",                      "Device", "Out",    "Pointer into the memory to an array of global size `N`. On exit, this array contains real eigen values of the matrix A."
	"d_Z",                      "Device", "Out",    "Pointer into the local memory to an array of dimension `(LLD_Z, LOCc(JZ+N-1))`. On exit, the array local parts of orthonormal eigenvectors of the matrix A."
	"IZ",                       "Host",   "In",     "Row index of the first row of the sub(Z). `IZ` must be a multiple of the row blocking dimension `MB_Z`."
	"JZ",                       "Host",   "In",     "Column index of the first column of the sub(Z). `JZ` must be a multiple of the column blocking dimension `NB_Z`."
	"descrZ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Z."	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by the routine as provided by :ref:`cusolverMpSyevd_bufferSize() <cusolverMpSyevd_bufferSize-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by the routine as provided by :ref:`cusolverMpSyevd_bufferSize() <cusolverMpSyevd_bufferSize-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| This function requires square block size `(MB_A == NB_A)` and alignment of sub(A) and sub(B) matrices, meaning `(MB_A == MB_Z)` and `(IZ == IZ)`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A and C", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----

.. _cusolverMpSyevd_bufferSize-label:

----------------------------------
:code:`cusolverMpSyevd_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSyevd_bufferSize(
            cusolverMpHandle_t handle,
            char *jobz,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            const cudaLibMpMatrixDesc_t descrA,
            void *d_d,
            void *d_Z,
            int64_t IZ,
            int64_t JZ,
            const cudaLibMpMatrixDesc_t descrZ,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpSyevd() <cusolverMpSyevd-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
	"jobz",                     "Host",   "In",     "If `jobz = N`, then eigenvalues are computed and if `jobz = V`, then eigenvalues and eigenvectors are computed."
	"uplo",                     "Host",   "In",     "Indicate that upper or lower triangular of sub(A) is used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A)."
	"d_A",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains local parts of the symmetric matrix A."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_d",                      "Device", "In",     "Pointer into the memory to an array of global size `N`. On exit, this array contains real eigen values of the matrix A."
	"d_Z",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_Z, LOCc(JZ+N-1))`. On exit, the array local parts of orthonormal eigenvectors of the matrix A."
	"IZ",                       "Host",   "In",     "Row index of the first row of the sub(Z). `IZ` must be a multiple of the row blocking dimension `MB_Z`."
	"JZ",                       "Host",   "In",     "Column index of the first column of the sub(Z). `JZ` must be a multiple of the column blocking dimension `NB_Z`."
	"descrZ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Z."	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpSyevd() <cusolverMpSyevd-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpSyevd() <cusolverMpSyevd-label>`"

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpSygst-label:

-----------------------
:code:`cusolverMpSygst`
-----------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSygst_bufferSize(
            cusolverMpHandle_t handle,
            cusolverEigType_t ibtype,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cusolverMpMatrixDescriptor_t descrA,
            const void *d_B,
            int64_t IB,
            int64_t JB,
            cusolverMpMatrixDescriptor_t descrB,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Reduces a hermitian-definite generalized eigenproblem to standard form. Denoting sub(A) and sub(B) as A(IA:IA+N-1, JA:JA+N-1) and B(IB:IB+N-1, JB:JB+N-1) respectively, the routine considers the following cases.

  * `ibtype = CUSOLVER_EIG_TYPE_1`: the problem is sub(A)x = lambda*sub(B), and sub(A) is overwritten by inv(L)*sub(A)*inv(L^H) or inv(U^H)*sub(A)*inv(U).
  * `ibtype = CUSOLVER_EIG_TYPE_2 or 3`: the problem is sub(A)*sub(B)*x = lambda*x or sub(B)*sub(A)*x = lambda*x, and sub(A) is overwritten by L^H*sub(A)*L or U*sub(A)*U^H.

| The sub(B) includes lower or upper Cholesky factors previously computed by :ref:`cusolverMpPotrf() <cusolverMpPotrf-label>`.
    
.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
  "ibtype",                   "Host",   "In",     "Indicate the eigen problem type sub(A)*x=(lambda)*sub(B)*x, sub(A)*sub(B)x=(lambda)*x, or sub(B)*sub(A)*x=(lambda)*x."
	"uplo",                     "Host",   "In",     "Indicate that lower `CUBLAS_FILL_MODE_LOWER` or upper `CUBLAS_FILL_MODE_UPPER` triangular of sub(A) and sub(B) are used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A) and sub(B)."
	"d_A",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains local parts of the symmetric matrix A."  
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_B",                      "Device", "In",     "Pointer into the local memory to an array of dimension `(LLD_B, LOCc(JB+N-1))`. This array contains local parts of the symmetric matrix B."    
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."  
	"workspaceInBytesOnDevice", "Host",   "In",     "The size in bytes of the local device workspace needed by :ref:`cusolverMpSygst() <cusolverMpSygst-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."  
	"workspaceInBytesOnHost",   "Host",   "In",     "The size in bytes of the local host workspace needed by :ref:`cusolverMpSygst() <cusolverMpSygst-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."


| The routine requires some alignment properties

  * Same square blocksize is used `(MB == NB)` for the matrix A and B.
  * The beginning row and column of A and B are aligned each other i.e., `(IA == IB)` and `(JA == JB`.   

| Note that the current implementation supports the inputs of `ibtype = CUSOLVER_EIG_TYPE_1`, `uplo = CUBLAS_FILL_MODE_LOWER`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpSygst_bufferSize-label:

----------------------------------
:code:`cusolverMpSygst_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSygst_bufferSize(
            cusolverMpHandle_t handle,
            cusolverEigType_t ibtype,
            cublasFillMode_t uplo,
            int64_t N,
            int64_t IA,
            int64_t JA,
            cusolverMpMatrixDescriptor_t descrA,
            int64_t IB,
            int64_t JB,
            cusolverMpMatrixDescriptor_t descrB,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpSygst() <cusolverMpSygst-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
  "ibtype",                   "Host",   "In",     "Indicate the eigen problem type sub(A)*x=(lambda)*sub(B)*x, sub(A)*sub(B)x=(lambda)*x, or sub(B)*sub(A)*x=(lambda)*x."
	"uplo",                     "Host",   "In",     "Indicate that lower `CUBLAS_FILL_MODE_LOWER` or upper `CUBLAS_FILL_MODE_UPPER` triangular of sub(A) and sub(B) are used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A) and sub(B)."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpSygst() <cusolverMpSygst-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpSygst() <cusolverMpSygst-label>`"

| The routine requires some alignment properties

  * Same square blocksize is used `(MB == NB)` for the matrix A and B.
  * The beginning row and column of A and B are aligned each other i.e., `(IA == IB)` and `(JA == JB`.   

| Note that the current implementation supports the inputs of `ibtype = CUSOLVER_EIG_TYPE_1`, `uplo = CUBLAS_FILL_MODE_LOWER`.
  
| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"


| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.


----


.. _cusolverMpSygvd-label:

-----------------------
:code:`cusolverMpSygvd`
-----------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSygvd(
            cusolverMpHandle_t handle,
            cusolverEigType_t ibtype,
            cusolverEigMode_t jobz,
            cublasFillMode_t uplo,
            int64_t N,
            void *d_A,
            int64_t IA,
            int64_t JA,
            cusolverMpMatrixDescriptor_t descrA,
            void *d_B,
            int64_t IB,
            int64_t JB,
            cusolverMpMatrixDescriptor_t descrB,
            void *d_d,
            void *d_Z,
            int64_t IZ,
            int64_t JZ,
            cusolverMpMatrixDescriptor_t descrZ,
            cudaDataType_t computeType,
            void *d_work,
            size_t workspaceInBytesOnDevice,
            void *h_work,
            size_t workspaceInBytesOnHost,
            int *info)

| Computes a hermitian-definite generalized eigenproblem using :ref:`cusolerMpSyevd() <cusolverMpSyevd-label>`. Denoting sub(A) and sub(B) as A(IA:IA+N-1, JA:JA+N-1) and B(IB:IB+N-1, JB:JB+N-1) respectively, the routine considers the following cases.

  * `ibtype = CUSOLVER_EIG_TYPE_1`: the problem is sub(A)x = lambda*sub(B).
  * `ibtype = CUSOLVER_EIG_TYPE_2`: the problem is sub(A)*sub(B)*x = lambda*x.
  * `ibtype = CUSOLVER_EIG_TYPE_3`: the problem is sub(B)*sub(A)*x = lambda*x.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
  "ibtype",                   "Host",   "In",     "Indicate the eigen problem type sub(A)*x=(lambda)*sub(B)*x, sub(A)*sub(B)x=(lambda)*x, or sub(B)*sub(A)*x=(lambda)*x."
	"jobz",                     "Host",   "In",     "Indicate whether the routine computes eigenvalues only `CUSOLVER_EIG_MODE_NOVECTOR` or includes eigenvectors as well `CUSOLVER_EIG_MODE_VECTOR`."
	"uplo",                     "Host",   "In",     "Indicate that lower `CUBLAS_FILL_MODE_LOWER` or upper `CUBLAS_FILL_MODE_UPPER` triangular of sub(A) and sub(B) are used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A) and sub(B)."
	"d_A",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_A, LOCc(JA+N-1))`. This array contains local parts of the symmetric matrix A and will be overwritten with standard eigen problem."  
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"d_B",                      "Device", "In/Out", "Pointer into the local memory to an array of dimension `(LLD_B, LOCc(JB+N-1))`. This array contains local parts of the symmetric matrix B and will be overwritten with Cholesky factors."    
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"d_Z",                      "Device", "Out",    "Pointer into the local memory to an array of dimension `(LLD_Z, LOCc(JZ+N-1))`. On exit, the array local parts of orthonormal eigenvectors of the matrix A."
  "IZ",                       "Host",   "In",     "Row index of the first row of the sub(Z). `IZ` must be a multiple of the row blocking dimension `MB_Z`."
	"JZ",                       "Host",   "In",     "Column index of the first column of the sub(Z). `JZ` must be a multiple of the column blocking dimension `NB_Z`."
	"descrZ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Z."	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"d_work",                   "Device", "Out",    "Device workspace of size `workspaceInBytesOnDevice`."  
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpSygvd() <cusolverMpSygvd-label>`."
	"h_work",                   "Host",   "Out",    "Host workspace of size `workspaceInBytesOnHost`."  
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpSygvd() <cusolverMpSygvd-label>`"
	"info",                     "Device", "Out",    "`info < 0` indicates an incorrect value of the i-th argument of the function."

| The routine requires some alignment properties

  * Same square blocksize is used `(MB == NB)` for the matrix A, B, and Z.
  * The beginning row and column of A, B and Z are aligned each other i.e., `(IA == IB == IZ)` and `(JA == JB == JZ`.   

| Note that the current implementation supports the inputs of `ibtype = CUSOLVER_EIG_TYPE_1`, `jobz = CUSOLVER_EIG_MODE_VECTOR`, `uplo = CUBLAS_FILL_MODE_LOWER`.

| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A and C", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.

----


.. _cusolverMpSygvd_bufferSize-label:

----------------------------------
:code:`cusolverMpSygvd_bufferSize`
----------------------------------

.. code-block:: cpp

    cusolverStatus_t cusolverMpSygvd_bufferSize(
            cusolverMpHandle_t handle,
            cusolverEigType_t ibtype,
            cusolverEigMode_t jobz,
            cublasFillMode_t uplo,
            int64_t N,
            int64_t IA,
            int64_t JA,
            cusolverMpMatrixDescriptor_t descrA,
            int64_t IB,
            int64_t JB,
            cusolverMpMatrixDescriptor_t descrB,
            int64_t IZ,
            int64_t JZ,
            cusolverMpMatrixDescriptor_t descrZ,
            cudaDataType_t computeType,
            size_t *workspaceInBytesOnDevice,
            size_t *workspaceInBytesOnHost)

| Computes the size in bytes of the host and device working buffers required by :ref:`cusolverMpSygvd() <cusolverMpSygvd-label>`.

.. csv-table::
   :header: "Parameter", "Memory", "In/Out", "Description"
   :widths: auto

	"handle",                   "Host",   "In",     "cuSOLVERMp library handle."
  "ibtype",                   "Host",   "In",     "Indicate the eigen problem type sub(A)*x=(lambda)*sub(B)*x, sub(A)*sub(B)x=(lambda)*x, or sub(B)*sub(A)*x=(lambda)*x."
	"jobz",                     "Host",   "In",     "Indicate whether the routine computes eigenvalues only `CUSOLVER_EIG_MODE_NOVECTOR` or includes eigenvectors as well `CUSOLVER_EIG_MODE_VECTOR`."
	"uplo",                     "Host",   "In",     "Indicate that lower `CUBLAS_FILL_MODE_LOWER` or upper `CUBLAS_FILL_MODE_UPPER` triangular of sub(A) and sub(B) are used to compute eigen solutions."
	"N",                        "Host",   "In",     "Number of rows and columns of sub(A) and sub(B)."
	"IA",                       "Host",   "In",     "Row index of the first row of the sub(A). `IA` must be a multiple of the row blocking dimension `MB_A`."
	"JA",                       "Host",   "In",     "Column index of the first column of the sub(A). `JA` must be a multiple of the column blocking dimension `NB_A`."
	"descrA",                   "Host",   "In",     "Matrix descriptor associated to the global matrix A."
	"IB",                       "Host",   "In",     "Row index of the first row of the sub(B). `IB` must be a multiple of the row blocking dimension `MB_B`."
	"JB",                       "Host",   "In",     "Column index of the first column of the sub(B). `JB` must be a multiple of the column blocking dimension `NB_B`."
	"descrB",                   "Host",   "In",     "Matrix descriptor associated to the global matrix B."
	"IZ",                       "Host",   "In",     "Row index of the first row of the sub(Z). `IZ` must be a multiple of the row blocking dimension `MB_Z`."
	"JZ",                       "Host",   "In",     "Column index of the first column of the sub(Z). `JZ` must be a multiple of the column blocking dimension `NB_Z`."
	"descrZ",                   "Host",   "In",     "Matrix descriptor associated to the global matrix Z."	
	"computeType",              "Host",   "In",     "Data type used for computations. See table below for supported combinations."
	"workspaceInBytesOnDevice", "Host",   "Out",    "The size in bytes of the local device workspace needed by :ref:`cusolverMpSygvd() <cusolverMpSygvd-label>`."
	"workspaceInBytesOnHost",   "Host",   "Out",    "The size in bytes of the local host workspace needed by :ref:`cusolverMpSygvd() <cusolverMpSygvd-label>`"

| The routine requires some alignment properties

  * Same square blocksize is used `(MB == NB)` for the matrix A, B, and Z.
  * The beginning row and column of A, B and Z are aligned each other i.e., `(IA == IB == IZ)` and `(JA == JB == JZ`.   

| Note that the current implementation supports the inputs of `ibtype = CUSOLVER_EIG_TYPE_1`, `jobz = CUSOLVER_EIG_MODE_VECTOR`, `uplo = CUBLAS_FILL_MODE_LOWER`.
    
| This routine supports the following combinations of data types:

.. csv-table::
   :header: "Data Type of A", "computeType", "Output Data Type"
   :widths: auto
	
	"CUDA_R_32F", "CUDA_R_32F", "CUDA_R_32F"
	"CUDA_R_64F", "CUDA_R_64F", "CUDA_R_64F"
	"CUDA_C_32F", "CUDA_C_32F", "CUDA_C_32F"
	"CUDA_C_64F", "CUDA_C_64F", "CUDA_C_64F"

| See `cusolverStatus_t <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverSPstatus>`_ for the description of the return status.