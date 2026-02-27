The server application executes in two main phases: a Control Phase and a TCP Server Phase.

This phase begins when the application is launched. It configures the DOCA Comch server, waits for the orchestrator application to connect, and spawns threads to handle incoming messages. Once producers and consumers are connected, it creates a TCP listening socket.

A typical flow is:

Open the required DOCA device. Create the DOCA Comch Server. Wait for the Orchestrator application to connect. Prepare the appropriate number of Socket threads. Each thread will: Create DOCA Comch Producers and Consumers.

Wait for Producers and Consumers to be connected.

Create and pool doca_comch_producer_send_tasks .

Create and submit doca_comch_consumer_post_recv_tasks . Create a TCP socket to listen for connections. Start the TCP Server phase.

In this phase, the application listens on the configured TCP port for incoming connections. When a client connects, its thread forwards messages to the GPU via DOCA Comch. It continues sending until the socket has no more data or the send_task pool is empty. It then waits for a response from the GPU, which is forwarded back to the remote client.

A typical per-thread flow loop is:

Poll the DOCA PE If a doca_comch_producer_send_task completion is received, add the task back to the task pool.

If a doca_comch_consumer_post_recv_task completion is received: Extract the response data. Write the response to the TCP socket.

Poll TCP socket If data can be read from the socket: Read data from the socket. Get the next available doca_comch_consumer_post_recv task (this should be doca_comch_producer_send_task based on context). Copy message contents into the send task's data buffer. Submit the send task.



In this phase, the application also listens for DOCA Comch control messages from the orchestrator or a CTRL-C signal to ensure a clean shutdown.

This application executes in two phases: a Control Phase and a GPU Processing Phase.

This phase configures the DOCA Comch client and connects to the server. It creates producers and consumers, allocates GPU memory, waits for connections, and then launches the CUDA Kernel.

Open the required DOCA device. Open the required GPU device. Create the DOCA Comch client. Connect to the DOCA Comch server. Allocate the appropriate GPU memory. Prepare the appropriate number of Producers and Consumers. Launch the CUDA Kernel.

Once the CUDA Kernel is launched, the CPU portion of the application remains running. It monitors the kernel's execution and listens for DOCA Comch control shutdown messages (from the server or a user CTRL-C signal). If a shutdown is detected, it stops the kernel and cleans up memory.

The GPU processing portion initially submits multiple post_recv buffers. It then enters a loop, polling for messages. When a message is received, it processes it (reverses the bytes), sends the response back to the server, and resubmits the buffer. This continues until a fatal error or a global stop flag is set by the CPU.

A typical CUDA thread loop is:

Poll for post_recv messages If a message is received: Extract the data. Verify the message is a client request. Reverse the order of the bytes in the data. Record the buffer in the inflight_messages array. Submit the response to the server using doca_dev_gpu_comch_producer_send .

Poll for producer send message completions If a send completion is indicated: Use user_msg_id to determine which buffer was sent. Submit this buffer to receive a new message with doca_dev_gpu_comch_consumer_post_recv .



This application is simpler than the other two in the use case. Its purpose is to initiate TCP connections to the doca_dpu_gpu_remote_offload_server , using one connection per thread.

Each thread sends a specified number of request messages while recording throughput. Upon receiving a response, the client validates its content against the expected response; if they do not match, the application exits with an error.

If all requests are sent and successfully validated, the application outputs final statistics for the run, including: