DOCA Documentation v2.10.0

DOCA UROM UCC Application Guide

Contents:

Warning

This application is not part of the current release

This guide provides ... TBD

The Unified Collective Communication (UCC) library is a high-performance, stand-alone collective library for both HPC and AI/ML workloads. it can unify hardware and software collectives and provides flexibility in the execution of collectives. This flexibility includes transports, algorithms, or composition of collectives as well as collective ordering.

DOCA UROM UCC application demonstrates UCC Collective offloading to the DPU by using the DOCA UROM library. the application is split into the following parts:

  • DPU Side: The implementation of the UCC plugin component to be loaded by the DOCA UROM Worker.

  • Host Side: Host application that performs UCC all-to-all collective algorithm as an example for UCC offloading to the DPU.

DOCA UROM UCC application shows how the UCC all-to-all collective can be offloaded to the DPU by using the DOCA UROM library and UCC UROM plugin.

image-2024-3-13_13-22-30-1-version-1-modificationdate-1734980642943-api-v2.png

DPU Side

DOCA UROM UCC worker plugin consists of one or more progress queues, one or more progress threads, and a one-sided data channel, along with the logic necessary to pack, unpack, and interpret commands. The progress queues allow for collective operations to be placed on multiple queues (i.e., one operation per queue) and driven by multiple threads, which is useful in a shared-worker scenario (i.e., multiple clients connected to the same worker). The progress threads allow for each client to perform collective operations with minimal queue wait time; however, it is only useful in the shared-worker case.

DOCA UROM Worker is responsible for the following:

  1. Initialization of the UCC library and allocate one or more progress queues

  2. Creation of a UCC context and creation of one or more progress threads

  3. Creation of a UCC team based on an existing UCC context on the DPU

  4. The initialization, posting, and completion of collective operations

  5. For non-XGVMI collectives, source arguments are pulled to the DPU for use in the collectives. The destination arguments are modified to point to the DPU buffer instead of the host buffer.

    1. Collective is initialized using ucc_collective_init()

    2. Collective is posted using ucc_collective_post()

    3. The resulting UCC collective request handle is added to a progress queue.

    4. The collective is progressed to completion and a notification is generated and sent to the client encapsulating the UCC and DOCA UROM notifications.

  6. Library resource destruction is completed through the appropriate UCC APIs such as ucc_context_destroy, etc.

Host Side

The DOCA UROM UCC application uses openMPI to launch two different processes, each process in the diagram divides its local sendbuf into 2 blocks, each containing sendcount elements (2 in this example). Process i sends the k-th block of its local sendbuf to process k which places the data in the i-th block of its local recvbuf.

Implementing the UCC all-to-all method using DOCA UROM offloads to be executed on the DPU, and leaves the CPU free to perform other computations.

UROM UCC Worker Component

The UROM UCC worker plugin component defines a set of commands:

Copy
Copied!
            

enum urom_worker_ucc_cmd_type { UROM_WORKER_CMD_UCC_LIB_CREATE, UROM_WORKER_CMD_UCC_LIB_DESTROY, UROM_WORKER_CMD_UCC_CONTEXT_CREATE, UROM_WORKER_CMD_UCC_CONTEXT_DESTROY, UROM_WORKER_CMD_UCC_TEAM_CREATE, UROM_WORKER_CMD_UCC_COLL, UROM_WORKER_CMD_UCC_CREATE_PASSIVE_DATA_CHANNEL, };

The associated notification types are:

Copy
Copied!
            

enum urom_worker_ucc_notify_type { UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE, UROM_WORKER_NOTIFY_UCC_LIB_DESTROY_COMPLETE, UROM_WORKER_NOTIFY_UCC_CONTEXT_CREATE_COMPLETE, UROM_WORKER_NOTIFY_UCC_CONTEXT_DESTROY_COMPLETE, UROM_WORKER_NOTIFY_UCC_TEAM_CREATE_COMPLETE, UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE, UROM_WORKER_NOTIFY_UCC_PASSIVE_DATA_CHANNEL_COMPLETE, };

UCC Library Create

This command is used to initialize the UCC library on the DPU.

The command has type: UROM_WORKER_CMD_UCC_LIB_CREATE. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_lib_create { void *params; };

  • params – UCC library parameters used for ucc_init. The resulting UCC library handle will be stored on the DPU.

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE. it returns success or an appropriate failure status.

UCC Context Create

This command is used to create a UCC context on the DPU.

The command has type: UROM_WORKER_CMD_UCC_CONTEXT_CREATE. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_context_create { union { int64_t start; int64_t *array; }; int64_t stride; int64_t size; void *base_va; uint64_t len; };

  • start– The started index

  • array– Set stride to <= 0 if array is used

  • stride– Set number of strides

  • size– Set stride size

  • base_vae– Shared buffer address

  • len– Buffer length

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_CONTEXT_CREATE_COMPLETE. The notification format is shown below.

Copy
Copied!
            

struct urom_worker_ucc_notify_context_create { void *context; };

  • context – DPU Pointer to UCC context

UCC Context Destroy

This command destroys a UCC context on the DPU.

The command has type: UROM_WORKER_NOTIFY_UCC_CONTEXT_DESTROY_COMPLETE. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_context_destroy { void *context_h; };

  • context_h– UCC context pointer

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_LIB_CREATE_COMPLETE and an appropriate status will be returned on success or failure.

UCC Team Create

This command will create a UCC team on the DPU.

The command has type: UROM_WORKER_CMD_UCC_TEAM_CREATE. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_team_create { int64_t start; int64_t stride; int64_t size; void *context_h; };

  • start– Team start index

  • stride– Number of strides

  • size – Stride size

  • context_h – UCC context

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_TEAM_CREATE_COMPLETE. The notification format is shown below.

Copy
Copied!
            

struct urom_worker_ucc_notify_team_create { void *team; };

  • team – Pointer to UCC team

UCC Collective

This command will initialize, post, and finalize a collective on the DPU.

The command has type: UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_coll { void *coll_args; /* Collective arguments */ void *team; /* UCC team */ int use_xgvmi; /* If operation uses XGVMI */ void *work_buffer; /* Work buffer */ size_t work_buffer_size; /* Buffer size */ size_t team_size; /* Team size */ };

  • coll_args– The UCC collective arguments

  • team– The DPU’s UCC team handle

  • use_xgvmi– A flag indicating if the collective requires XGVMI communication.

  • work_buffer– A work buffer handle. This may be NULL and will be packed by the DOCA UROM client.

  • work_buffer_size– The size in bytes of the work buffer.

  • team_size– The size of the Team. This is used to pack structures in the UCC collective arguments if necessary.

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_COLLECTIVE_COMPLETE. The notification format is shown below.

Copy
Copied!
            

struct urom_worker_ucc_notify_collective { ucc_status_t status; /* UCC collective status */ };

  • status – UCC collective status

UCC passive data channel

This command will connect the DPU to the host application via an endpoint.

The command has type: UROM_WORKER_CMD_UCC_CREATE_PASSIVE_DATA_CHANNEL. The command format is shown below.

Copy
Copied!
            

struct urom_worker_cmd_ucc_pass_dc { void *ucp_addr; /* UCP worker address on host */ size_t addr_len; /* UCP worker address length */ };

  • ucp_addr– An address for the connection.

  • addr_len – The length of the address.

This command returns a notification of type: UROM_WORKER_NOTIFY_UCC_PASSIVE_DATA_CHANNEL_COMPLETE. The notification format is shown below.

Copy
Copied!
            

struct urom_worker_ucc_notify_pass_dc { ucc_status_t status; };

  • status – UCC data channel status

This application leverages the following DOCA libraries:

  • DOCA UROM

  • UCC framework DOCA driver

Refer to their respective programming guide for more information.

Info

Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.

Tip

For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The sources of the application can be found under the application's directory:/opt/mellanox/doca/applications/urom_ucc/.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

Note

On host doca_urom_ucc is created under /tmp/build/urom_ucc/host/ and on the DPU side, the UCC worker plugin worker_ucc.so is created under /tmp/build/urom_ucc/dpu/


Compiling Only the Current Application

To directly build only the UROM UCC application:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=false -Denable_urom_ucc=true ninja -C /tmp/build

Info

On host doca_urom_ucc is created under /tmp/build/urom_ucc/host/ and on the DPU side, the UCC worker plugin worker_ucc.so is created under /tmp/build/urom_ucc/dpu/

Alternatively, one can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

  1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

    • Set enable_all_applications to false

    • Set enable_urom_ucc to true

  2. Run the following compilation commands:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

    Info

    On host doca_urom_ucc is created under /tmp/build/urom_ucc/host/ and on the DPU side, the UCC worker plugin worker_ucc.so is created under /tmp/build/urom_ucc/dpu/

Troubleshooting

Refer to the DOCA Troubleshooting for any issue encountered with the compilation of the application.

Host Application Execution

The UROM UCC application is provided in source form, hence a compilation is required before the application can be executed.

Application usage instructions:

Copy
Copied!
            

Usage: doca_urom_ucc [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse all command flags from an input json file   Program Flags: -d, --device <IB device name> IB device name.

Info

This usage printout can be printed to the command line using the -h (or --help) options:

Copy
Copied!
            

./doca_urom_ucc -h

Info

For additional information, refer to section "Command Line Flags".

  1. CLI example for running the application on the host:

    Copy
    Copied!
                

    mpirun -np 2 ./doca_urom_ucc -d mlx5_0

  2. The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:

    Copy
    Copied!
                

    ./doca_urom_ucc --json [json_file]

    For example:

    Copy
    Copied!
                

    ./doca_urom_ucc --json ./urom_ucc_params.json

UCC DPU Plugin Component

The UROM UCC plugin component is provided in source form, hence a compilation is required before the application can be executed in order when spawning UROM worker could load the plugin in runtime and it is compiled as .so file.

The plugin exposes the following symbols:

  • Get DOCA worker plugin interface for UCC plugin.

Copy
Copied!
            

doca_error_t urom_plugin_get_iface(struct urom_plugin_iface *iface);

  • Get the UCC plugin version, which will be used to verify that the host and DPU plugin versions are compatible

Copy
Copied!
            

doca_error_t urom_plugin_get_version(uint64_t *version);


Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

General flags

h

help

Print a help synopsis

N/A

v

version

Print program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (requires compilation with TRACE log level support)

Copy
Copied!
            

"log-level": 60

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

Copy
Copied!
            

"sdk-log-level": 40

j

json

Parse all command flags from an input JSON file

N/A

Program flags

d

device

DOCA UROM IB device name.

Copy
Copied!
            

"device": "mlx5_0"

Info

Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.


Troubleshooting

Refer to the DOCA Troubleshooting for any issue encountered with the installation or execution of the DOCA applications.

  1. Parse application argument.

    1. Initialize arg parser resources and register DOCA general parameters.

      Copy
      Copied!
                  

      doca_argp_init();

    2. Register UROM UCC application parameters.

      Copy
      Copied!
                  

      register_urom_ucc_params();

    3. Parse the arguments.

      Copy
      Copied!
                  

      doca_argp_start();

  2. Arg parser destroy.

Copy
Copied!
            

doca_argp_destroy();

  • /opt/mellanox/doca/applications/urom_ucc/

  • /opt/mellanox/doca/applications/urom_ucc/urom_ucc_params.json

© Copyright 2025, NVIDIA. Last updated on Jul 10, 2025.