**************************************************** Communication Abstraction Library API and data types **************************************************** Communication Abstraction Library (CAL) is a helper module for the cuSOLVERMp library that allows it to efficiently perform communications between different GPUs . The cuSOLVERMp grid creation API accepts cal_comm_t communicator object and requires it to be created prior to any cuSOLVERMp call. As for now, CAL supports only the use-case where each participating process uses single GPU and each participating GPU can only be used by a single process. ---- .. _module-usage-label: ======================================= Communication abstraction library usage ======================================= In order to initalize communicator object cal_comm_t you would need to follow bootstrapping process - see respective :ref:`cal_comm_create() ` function. The main communication backend used by cuSOLVERMp is the UCC library, which is part of the HPC-SDK package. Primary components of the UCC library are underlying transports that carry out respective data exchanges between processes. By default, UCC will try to initialize all of the supported transport - i.e. OpenUCX, NCCL, SHARP, CUDA Runtime, etc, however, the specific set of transports to be used will be decided by UCC configuration and runtime. Note that those transport dependencies may have their behavior altered by external configurations through configuration files and environment variables (I.e. environment that affects NCCL: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html, environment that affects OpenUCX: https://openucx.readthedocs.io/en/master/faq.html) - refer to your platform provider if there are known optimized settings for these dependencies. Otherwise cuSOLVERMp and UCC will use default values. Based on the nature of underlying communications there are few restrictions to keep in mind when using cuSOLVERMp: * Only one cuSOLVERMp routine can be in the fly at any point of time in the process. It is possible, however, to create and keep multiple communications/cuSOLVERMp handles, but it's a user's responsibility to ensure that execution of cuSOLVERMp routines do not overlap. * If you are using NCCL communication library, NCCL collective calls should not overlap with cuSOLVERMp calls to avoid possible deadlocks. There are few environment variables that can change communication module behaviour. .. csv-table:: :header: "Variable", "Description" :widths: auto "CAL_LOG_LEVEL", "Verbosity level of communication module with `0` means no output and `6` means maximum amount of details. Default - `0`" "UCC_CONFIG_FILE", "Custom config file for UCC library. It can control underlying transports and collective parameters. Refer to UCC documentation for details on configuration syntax. Default - built-in cuSOLVERMp UCC configuration." ---- ============================================ Communication abstraction library data types ============================================ .. _calError_t-label: ------------------ :code:`calError_t` ------------------ Return values from communication abstraction library APIs. The values are described in the table below: .. csv-table:: :header: "Value", "Description" :widths: auto "CAL_OK", "Success." "CAL_ERROR", "Generic error." "CAL_ERROR_INVALID_PARAMETER", "Invalid parameter to the interface function." "CAL_ERROR_INTERNAL", "Invalid error." "CAL_ERROR_CUDA", "Error in CUDA runtime or driver API." "CAL_ERROR_IPC", "Error in system IPC communication call." "CAL_ERROR_UCC", "Error in UCC call." "CAL_ERROR_NOT_SUPPORTED", "Requested configuration or parameters are not supported." "CAL_ERROR_BACKEND", "Error in general backend dependency, run with verbose log level to see detailed error message" "CAL_ERROR_INPROGRESS", "Operation is still in progress" ---- .. _cal_comm_t-label: ------------------ :code:`cal_comm_t` ------------------ | The `cal_comm_t` stores device endpoint and resources related to communication. It must be created and destroyed using :ref:`cal_comm_create() ` and :ref:`cal_comm_destroy() ` functions respectively. ---- .. _cal_comm_create_params-label: -------------------------------- :code:`cal_comm_create_params_t` -------------------------------- .. code-block:: cpp typedef struct cal_comm_create_params { calError_t (*allgather)(void* src_buf, void* recv_buf, size_t size, void* data, void** request); calError_t (*req_test)(void* request); calError_t (*req_free)(void* request); void* data; int nranks; int rank; int local_device; } cal_comm_create_params_t; | The `cal_comm_create_params_t` structure is a parameter to communication module creation function. This structure must be filled by the user prior to calling :ref:`cal_comm_create() `. Description of the fields for this structure: .. csv-table:: :header: "Field", "Description" :widths: auto "allgather", "Pointer to function that implements `allgather` functionality on the host memory. This function can be asynchronous with respect to the host - in this case function should create handler that can be addressed by respective `req_test` and `req_free` functions. Pointer to this handler should be written to the `request` parameter" "req_test", "If allgather function is asynchronous, this function will be used to query whether or not data was exchanged and can be used by communicator. Should return `0` if exchange was finished and `CAL_ERROR_INPROGRESS` otherwise" "req_free", "If allgather function is asynchronous, this function will be used after the data exchange by `allgather` function was finished to free resources used by `request` handle" "data", "Pointer to additional data that will be provided to `allgather` function at the time of the call" "nranks", "Number of ranks participating in the communicator that will be created" "rank", "Rank that will be assigned to the caller process in the new communicator. Should be the number between `0` and `nranks`" "local_device", "Local device that will be used by the cusolverMp using this communicator. Note that user should create device context prior to using this device in CAL or cusovlerMp calls." ---- ===================================== Communication abstraction library API ===================================== .. _cal_comm_create-label: ----------------------------- :code:`cal_comm_create` ----------------------------- .. code-block:: cpp calError_t cal_comm_create( cal_comm_create_params_t params, cal_comm_t* new_comm) | Handler created with this function is required for using cuSOLVERMp API. Note that user should create device context for the device specified in create parameters prior to using in CAL or cuSOLVERMp calls. Easiest way to ensure that device context is created is to call `cudaSetDevice(device_id); cudaFree(0)`. See :ref:`cal_comm_create_params_t ` documentation for instructions on how to fill this structure. .. csv-table:: :header: "Parameter", "Description" :widths: auto "mpi_comm", "Pointer to MPI Communicator that will be used for communicator setup." "local_device", "Local device id that will be assigned to new communicator. Should be same as device of active context." "new_comm", "Pointer where to store new communicator handle." See :ref:`calError_t ` for the description of the return value. ---- .. _cal_comm_destroy-label: ------------------------ :code:`cal_comm_destroy` ------------------------ .. code-block:: cpp calError_t cal_comm_destroy( cal_comm_t comm) | Releases resources associated with provided communicator handle .. csv-table:: :header: "Parameter", "Description" :widths: auto "comm", "Communicator handle to release." See :ref:`calError_t ` for the description of the return value. ---- .. _cal_stream_sync-label: ----------------------- :code:`cal_stream_sync` ----------------------- .. code-block:: cpp calError_t cal_stream_sync( cal_comm_t comm, cudaStream_t stream) | Blocks calling thread until all of the outstanding device operations are finished in `stream`. Use this function in place of `cudaStreamSynchronize` in order to progress possible outstanding communication operations for the communicator. .. csv-table:: :header: "Parameter", "Description" :widths: auto "comm", "Communicator handle." "stream", "CUDA stream to synchronize." See :ref:`calError_t ` for the description of the return value. ---- .. _cal_get_rank-label: ------------------------- :code:`cal_get_comm_size` ------------------------- .. code-block:: cpp calError_t cal_get_comm_size( cal_comm_t comm, int* size ) | Retrieve number of processing elements in the provided communicator. .. csv-table:: :header: "Parameter", "Description" :widths: auto "comm", "Communicator handle." "size", "Number of processing elements." See :ref:`calError_t ` for the description of the return value. ---- .. _cal_get_comm_size-label: -------------------- :code:`cal_get_rank` -------------------- .. code-block:: cpp calError_t cal_get_rank( cal_comm_t comm, int* rank ) | Retrieve processing element rank assigned to communicator (base-0). .. csv-table:: :header: "Parameter", "Description" :widths: auto "comm", "Communicator handle." "rank", "Rank Id of the caller process." See :ref:`calError_t ` for the description of the return value.