DOCA UROM UCC Application Guide
Contents:
This application is not part of the current release
This guide provides ... TBD
The Unified Collective Communication (UCC) library is a high-performance, stand-alone collective library for both HPC and AI/ML workloads. it can unify hardware and software collectives and provides flexibility in the execution of collectives. This flexibility includes transports, algorithms, or composition of collectives as well as collective ordering.
DOCA UROM UCC application demonstrates UCC Collective offloading to the DPU by using the DOCA UROM library. the application is split into the following parts:
DPU Side: The implementation of the UCC plugin component to be loaded by the DOCA UROM Worker.
Host Side: Host application that performs UCC all-to-all collective algorithm as an example for UCC offloading to the DPU.
DOCA UROM UCC application shows how the UCC all-to-all collective can be offloaded to the DPU by using the DOCA UROM library and UCC UROM plugin.

DPU Side
DOCA UROM UCC worker plugin consists of one or more progress queues, one or more progress threads, and a one-sided data channel, along with the logic necessary to pack, unpack, and interpret commands. The progress queues allow for collective operations to be placed on multiple queues (i.e., one operation per queue) and driven by multiple threads, which is useful in a shared-worker scenario (i.e., multiple clients connected to the same worker). The progress threads allow for each client to perform collective operations with minimal queue wait time; however, it is only useful in the shared-worker case.
DOCA UROM Worker is responsible for the following:
Initialization of the UCC library and allocate one or more progress queues
Creation of a UCC context and creation of one or more progress threads
Creation of a UCC team based on an existing UCC context on the DPU
The initialization, posting, and completion of collective operations
For non-XGVMI collectives, source arguments are pulled to the DPU for use in the collectives. The destination arguments are modified to point to the DPU buffer instead of the host buffer.
Collective is initialized using ucc_collective_init()
Collective is posted using ucc_collective_post()
The resulting UCC collective request handle is added to a progress queue.
The collective is progressed to completion and a notification is generated and sent to the client encapsulating the UCC and DOCA UROM notifications.
Library resource destruction is completed through the appropriate UCC APIs such as ucc_context_destroy, etc.
Host Side
The DOCA UROM UCC application uses openMPI to launch two different processes, each process in the diagram divides its local sendbuf into 2 blocks, each containing sendcount elements (2 in this example). Process i sends the k-th block of its local sendbuf to process k which places the data in the i-th block of its local recvbuf.
Implementing the UCC all-to-all method using DOCA UROM offloads to be executed on the DPU, and leaves the CPU free to perform other computations.
UROM UCC Worker Component
The UROM UCC worker plugin component defines a set of commands:
enum
urom_worker_ucc_cmd_type {
UROM_WORKER_CMD_UCC_LIB_CREATE,
UROM_WORKER_CMD_UCC_LIB_DESTROY,
UROM_WORKER_CMD_UCC_CONTEXT_CREATE,
UROM_WORKER_CMD_UCC_CONTEXT_DESTROY,
UROM_WORKER_CMD_UCC_TEAM_CREATE,
UROM_WORKER_CMD_UCC_COLL,
UROM_WORKER_CMD_UCC_CREATE_PASSIVE_DATA_CHANNEL,
};
The associated notification types are:
enum
urom_worker_ucc_notify_type {
UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE,
UROM_WORKER_NOTIFY_UCC_LIB_DESTROY_COMPLETE,
UROM_WORKER_NOTIFY_UCC_CONTEXT_CREATE_COMPLETE,
UROM_WORKER_NOTIFY_UCC_CONTEXT_DESTROY_COMPLETE,
UROM_WORKER_NOTIFY_UCC_TEAM_CREATE_COMPLETE,
UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE,
UROM_WORKER_NOTIFY_UCC_PASSIVE_DATA_CHANNEL_COMPLETE,
};
UCC Library Create
This command is used to initialize the UCC library on the DPU.
The command has type: UROM_WORKER_CMD_UCC_LIB_CREATE. The command format is shown below.
struct urom_worker_cmd_ucc_lib_create {
void
*params;
};
params
– UCC library parameters used for ucc_init. The resulting UCC library handle will be stored on the DPU.
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE. it returns success or an appropriate failure status.
UCC Context Create
This command is used to create a UCC context on the DPU.
The command has type: UROM_WORKER_CMD_UCC_CONTEXT_CREATE. The command format is shown below.
struct urom_worker_cmd_ucc_context_create {
union {
int64_t start;
int64_t *array;
};
int64_t stride;
int64_t size;
void
*base_va;
uint64_t len;
};
start
– The started indexarray
– Set stride to <= 0 if array is usedstride
– Set number of stridessize
– Set stride sizebase_vae
– Shared buffer addresslen
– Buffer length
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_CONTEXT_CREATE_COMPLETE. The notification format is shown below.
struct urom_worker_ucc_notify_context_create {
void
*context;
};
context – DPU Pointer to UCC context
UCC Context Destroy
This command destroys a UCC context on the DPU.
The command has type: UROM_WORKER_NOTIFY_UCC_CONTEXT_DESTROY_COMPLETE. The command format is shown below.
struct urom_worker_cmd_ucc_context_destroy {
void
*context_h;
};
context_h
– UCC context pointer
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE and an appropriate status will be returned on success or failure.
UCC Team Create
This command will create a UCC team on the DPU.
The command has type: UROM_WORKER_CMD_UCC_TEAM_CREATE. The command format is shown below.
struct urom_worker_cmd_ucc_team_create {
int64_t start;
int64_t stride;
int64_t size;
void
*context_h;
};
start
– Team start indexstride
– Number of stridessize
– Stride sizecontext_h
– UCC context
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_TEAM_CREATE_COMPLETE. The notification format is shown below.
struct urom_worker_ucc_notify_team_create {
void
*team;
};
team – Pointer to UCC team
UCC Collective
This command will initialize, post, and finalize a collective on the DPU.
The command has type: UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE. The command format is shown below.
struct urom_worker_cmd_ucc_coll {
void
*coll_args; /* Collective arguments */
void
*team; /* UCC team */
int
use_xgvmi; /* If operation uses XGVMI */
void
*work_buffer; /* Work buffer */
size_t work_buffer_size; /* Buffer size */
size_t team_size; /* Team size */
};
coll_args
– The UCC collective argumentsteam
– The DPU’s UCC team handleuse_xgvmi
– A flag indicating if the collective requires XGVMI communication.work_buffer
– A work buffer handle. This may be NULL and will be packed by the DOCA UROM client.work_buffer_size
– The size in bytes of the work buffer.team_size
– The size of the Team. This is used to pack structures in the UCC collective arguments if necessary.
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE. The notification format is shown below.
struct urom_worker_ucc_notify_collective {
ucc_status_t status; /* UCC collective status */
};
status
– UCC collective status
UCC passive data channel
This command will connect the DPU to the host application via an endpoint.
The command has type: UROM_WORKER_CMD_UCC_CREATE_PASSIVE_DATA_CHANNEL. The command format is shown below.
struct urom_worker_cmd_ucc_pass_dc {
void
*ucp_addr; /* UCP worker address on host */
size_t addr_len; /* UCP worker address length */
};
ucp_addr
– An address for the connection.addr_len
– The length of the address.
This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_PASSIVE_DATA_CHANNEL_COMPLETE. The notification format is shown below.
struct urom_worker_ucc_notify_pass_dc {
ucc_status_t status;
};
status – UCC data channel status
Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.
The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.
For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.
The sources of the application can be found under the application's directory:/opt/mellanox/doca/applications/urom_ucc/
.
Compiling All Applications
All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.
To build all the applications together, run:
cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
On host doca_urom_ucc
is created under /tmp/build/urom_ucc/host/
and on the DPU side, the UCC worker plugin worker_ucc.so
is created under /tmp/build/urom_ucc/dpu/
Compiling Only the Current Application
To directly build only the UROM UCC application:
cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false
-Denable_urom_ucc=true
ninja -C /tmp/build
On host doca_urom_ucc
is created under /tmp/build/urom_ucc/host/
and on the DPU side, the UCC worker plugin worker_ucc.so
is created under /tmp/build/urom_ucc/dpu/
Alternatively, one can set the desired flags in the meson_options.txt
file instead of providing them in the compilation command line:
Edit the following flags in
/opt/mellanox/doca/applications/meson_options.txt
:Set
enable_all_applications
tofalse
Set
enable_urom_ucc
totrue
Run the following compilation commands:
cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build
InfoOn host
doca_urom_ucc
is created under/tmp/build/urom_ucc/host/
and on the DPU side, the UCC worker pluginworker_ucc.so
is created under/tmp/build/urom_ucc/dpu/
Troubleshooting
Refer to the DOCA Troubleshooting for any issue encountered with the compilation of the application.
Host Application Execution
The UROM UCC application is provided in source form, hence a compilation is required before the application can be executed.
Application usage instructions:
Usage: doca_urom_ucc [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
--sdk-log-level Set the SDK (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
-j, --json <path> Parse all command flags from an input json file
Program Flags:
-d, --device <IB device name> IB device name.
This usage printout can be printed to the command line using the -h
(or --help
) options:
./doca_urom_ucc -h
For additional information, refer to section "Command Line Flags".
CLI example for running the application on the host:
mpirun -np
2
./doca_urom_ucc -d mlx5_0The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:
./doca_urom_ucc --json [json_file]
For example:
./doca_urom_ucc --json ./urom_ucc_params.json
UCC DPU Plugin Component
The UROM UCC plugin component is provided in source form, hence a compilation is required before the application can be executed in order when spawning UROM worker could load the plugin in runtime and it is compiled as .so
file.
The plugin exposes the following symbols:
Get DOCA worker plugin interface for UCC plugin.
doca_error_t urom_plugin_get_iface(struct urom_plugin_iface *iface);
Get the UCC plugin version, which will be used to verify that the host and DPU plugin versions are compatible
doca_error_t urom_plugin_get_version(uint64_t *version);
Command Line Flags
Flag Type | Short Flag | Long Flag/JSON Key | Description | JSON Content |
General flags |
|
| Print a help synopsis | N/A |
|
| Print program version information | N/A | |
|
| Set the log level for the application:
|
| |
N/A |
| Set the log level for the program:
|
| |
|
| Parse all command flags from an input JSON file | N/A | |
Program flags |
|
| DOCA UROM IB device name. |
|
Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.
Troubleshooting
Refer to the DOCA Troubleshooting for any issue encountered with the installation or execution of the DOCA applications.
Parse application argument.
Initialize arg parser resources and register DOCA general parameters.
doca_argp_init();
Register UROM UCC application parameters.
register_urom_ucc_params();
Parse the arguments.
doca_argp_start();
Arg parser destroy.
doca_argp_destroy();
/opt/mellanox/doca/applications/urom_ucc/
/opt/mellanox/doca/applications/urom_ucc/urom_ucc_params.json