What can I help you with?
DOCA Documentation v3.0.0

DOCA Stream Receive Performance Application Guide

This guide outlines the implementation of the DOCA Stream Receive Performance application, built on top of the NVIDIA® BlueField® DPU.

The Stream Receive Performance application is designed to measure and evaluate RX performance using the NVIDIA DOCA RMAX library. It leverages the capabilities of DOCA RMAX and NVIDIA Rivermax to support efficient, high-performance media and data streaming.

Key Technologies

  • DOCA RMAX API – A component of the NVIDIA DOCA framework, optimized for networking tasks in media streaming use cases.

  • NVIDIA Rivermax SDK – Built to exploit BlueField DPU hardware acceleration, enabling direct data transfers between the NIC and GPU, minimizing CPU load.

This architecture delivers high throughput, ultra-low latency, and minimal CPU utilization making it an ideal solution for demanding real-time streaming workloads.

Deployment Notes

  • DOCA Rivermax applications must run on BlueField target DPUs with root privileges or other additional permissions and capabilities

  • Ensure the DPU has a valid IP address configured

  • Allocate an appropriate number of huge pages for optimal performance. Refer to "Rivermax Performance-oriented Development Guidelines" for details.

    Info

    To access this document, join the NVIDIA Rivermax SDK developers' program and access documentation in the Rivermax Developer page.

  • Runtime configurations can be tuned even after the application starts, allowing dynamic performance optimization

For complete setup steps and advanced configurations, refer to DOCA RMAX documentation.

The application is designed to receive and process network packets using the DOCA library. It is structured around three core components:

  • Configuration management – Manages the initialization, parsing, validation, and cleanup of application configuration parameters

  • Global resources management – Handles the allocation and management of shared resources such as memory maps, buffer inventories, and progress engines

  • Stream management – Manages the lifecycle of data streams used for packet reception, including setup, execution, and teardown

The architecture comprises several key modules and their responsibilities.

Main Application

  • Initialization – Sets up logging, parses command-line arguments, and initializes the configuration.

  • Device listing – If the --list flag is passed, it enumerates and prints available devices, then exits.

  • Stream processing – Initializes global resources, configures the stream, and enters the packet reception loop.

Configuration Management

  • Initialization – Applies default values and creates the CPU affinity mask.

  • Argument parsing – Parses command-line arguments and updates the configuration accordingly.

  • Validation – Verifies that all required parameters are provided.

  • Destruction – Frees any configuration-related resources.

Global Resources Management

  • Initialization – Sets up shared memory maps, buffer inventories, and progress engines required for data handling

  • Destruction – Cleans up and releases global resources

Stream Management

  • Initialization – Configures and starts the stream, allocates memory buffers, and attaches the necessary flows

  • Packet reception loop – Processes incoming packets, manages events, and collects runtime statistics

  • Destruction – Detaches flows, stops the stream, and releases associated buffers

Application Functions and Roles

Function(s)

Role

main

Entry point of the application, handles overall flow control

init_config; destroy_config

Manage application configuration

register_argp_params

Register command-line arguments

init_globals; destroy_globals

Manage global resources

init_stream; destroy_stream

Manage stream setup and teardown

run_recv_loop

Main loop for receiving and processing packets

handle_completion; handle_error

Event handlers for packet reception and errors


Data Structures

  • app_config – Holds configuration parameters for the application

  • globals – Holds global resources required by the application

  • stream_data – Manages the state and data associated with streaming

Event Handling

  • Completion Events – Handled by handle_completion, updates statistics and optionally dumps packet content

  • Error Events – Handled by handle_error, logs errors and stops the receive loop

Flow

  1. Initialization – Set up logging, configuration, and global resources.

  2. Device listing – Optionally list available devices.

  3. Stream setup – Configure and initialize the stream.

  4. Packet reception – Enter the main loop to receive and process packets.

  5. Teardown – Clean up resources and exit.

image-2025-3-16_15-23-2-version-2-modificationdate-1743607748027-api-v2.png

This application leverages the following DOCA library:

The RMAX library must be compiled and run, and a Rivermax license is required to run this application, as is the case with every application using DOCA RMAX. Refer to NVIDIA Rivermax SDK page to obtain that license.

Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.

For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/stream_receive_perf/.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

Compiling Only the Current Application

  1. To directly build only the stream receive performance application:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=false -Denable_stream_receive_perf=true ninja -C /tmp/build

    doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

  2. Alternatively, one can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

    1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

      • Set enable_all_applications to false

      • Set enable_stream_receive_perf to true

    2. The same compilation commands should be used, as were shown in the previous section:

      Copy
      Copied!
                  

      cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

      doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

Troubleshooting

Please refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the compilation of the DOCA applications.

Prerequisites

Info

This application can run on the target DPU only.

Info

This application must be run with root privileges or other additional permissions and capabilities.

  • An IP address to the device being used must be set up .

  • It is recommended to have at least 800 huge pages enabled to achieve maximum performance:

    Copy
    Copied!
                

    dpu> echo 1000000000 > /proc/sys/kernel/shmmax dpu> echo 800 > /proc/sys/vm/nr_hugepages

Application Execution

The stream receive performance application is provided in source form, hence a compilation is required before the application can be executed.

  • Application usage instructions

    Copy
    Copied!
                

    Usage: doca_stream_receive_perf  [DOCA Flags] [Program Flags]   DOCA Flags:   -h, --help                        Print a help synopsis   -v, --version                     Print program version information   -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   -j, --json <path>                 Parse all command flags from an input json file   Program Flags:    --list                            List available devices    --scatter-type                    Scattering type: RAW (default) or ULP    --tstamp-format                   Timestamp format: raw (default), free-running or synced     -s, --src-ip                      Source address to read from     -d, --dst-ip                      Destination address to bind to     -i, --local-ip                    IP of the local interface to receive data     -p, --dst-port                    Destination port to read from     -K, --packets                     Number of packets to allocate memory for (default 262144)     -y, --payload-size                Packet's payload size (default 1500)     -e, --app-hdr-size                Packet's application header size (default 0)     -a, --cpu-affinity                Comma separated list of CPU affinity cores for the application main thread     --sleep                           Amount of microseconds to sleep between requests (default 0)     --min                             Block until at least this number of packets are received (default 0)     --max                             Maximum number of packets to return in one completion     --dump                            Dump packet content

    For additional information, please refer to the "Command Line Flags" section below.

    The above usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    ./doca_stream_receive_perf -h

  • CLI example for listing available devices:

    Copy
    Copied!
                

    ./doca_stream_receive_perf --list

  • CLI example for receiving a stream sent from 1.1.63.5 to the local NIC address 1.1.64.67 and port 7000 :

    Copy
    Copied!
                

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 1.1.64.67 --src-ip 1.1.63.5 --dst-port 7000

  • CLI example for receiving a stream receiving a stream sent on 239.0.0.1 to the local NIC 1.1.64.67 from 1.1.63.5 and port 7000:

    Copy
    Copied!
                

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 239.0.0.1 --src-ip 1.1.63.5 --dst-port 7000

  • CLI example for receiving a stream using header-data split mode. This example r eceives a stream sent from 1.1.63.5 to the local NIC address 1.1.64.67 and port 7000. The application header size is 20 bytes, and the payload size is 1200 bytes:

    Copy
    Copied!
                

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 1.1.64.67 --src-ip 1.1.63.5 --dst-port 7000 --app-hdr-size 20 --payload-size 1200

    Info

    Setting the application header size enables header-data split mode which separates the application header from the payload.

  • The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:

    Copy
    Copied!
                

    ./doca_stream_receive_perf --json [json_file]

    For example:

    Copy
    Copied!
                

    ./doca_stream_receive_perf --json ./stream_receive_perf_params.json

Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

General flags

h

help

Print a help synopsis

N/A

v

version

Print program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (Requires compilation with Trace level support)

Copy
Copied!
            

"log-level": 60

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

Copy
Copied!
            

"sdk-log-level": 40

j

json

Parse all command flags from an input JSON file

N/A

Program flags

N/A

list

List all available devices, dump their IPv4 addresses, and tell whether or not the PTP clock is supported

Copy
Copied!
            

"list" : true

N/A

scatter-type

Scattering type:

  • RAW (default)

  • ULP

Copy
Copied!
            

"scatter-type" : "RAW"

N/A

tstamp-format

Timestamp format:

  • raw (default)

  • free-running

  • synced

Copy
Copied!
            

"tstamp-format" : "raw"

s

src-ip

Source IP address to read from

Copy
Copied!
            

"src-ip" : "1.1.63.5"

d

dst-ip

Destination IP address to bind to

Copy
Copied!
            

"dst-ip" : "1.1.64.67"

i

local-ip

IP of the local interface to receive data

Copy
Copied!
            

"local-ip" : "1.1.64.67"

p

dst-port

Destination port to read from

Copy
Copied!
            

"dst-port" : 7000

K

packets

Number of packets to allocate memory for (default 262144)

Copy
Copied!
            

"packets" : 262144

y

payload-size

Packet's payload size (default 1500)

Copy
Copied!
            

"payload-size" : 1200

e

app-hdr-size

Packet's application header size (default 0)

Copy
Copied!
            

"app-hdr-size" : 20

a

cpu-affinity

list of CPU affinity cores for the application main thread

Copy
Copied!
            

"cpu-affinity" : "1,2,3"

N/A

sleep

Amount of microseconds to sleep between requests

Copy
Copied!
            

"sleep" : 100

N/A

min

Block until at least this number of packets are received

Copy
Copied!
            

"min" : 0

N/A

max

Maximum number of packets to return in one completion

Copy
Copied!
            

"max" : 1000

N/A

dump

Dump packet content

Copy
Copied!
            

"dump" : true

Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.

Troubleshooting

Please refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation or execution of the DOCA applications.

    1. Parse application argument.

      1. Initialize arg parser resources and register DOCA general parameters.

        Copy
        Copied!
                    

        init_config();

      2. Register stream receive performance application parameters.

        Copy
        Copied!
                    

        register_argp_params();

      3. Parse the arguments.

        Copy
        Copied!
                    

        doca_argp_start();

        1. Parse app parameters.

    2. Device listing.

      If the list parameter is set to true, the application lists all available devices.

      1. Initializes the DOCA RMAX library.

        Copy
        Copied!
                    

        doca_rmax_init();

      2. Enumerates and lists all available devices.

        Copy
        Copied!
                    

        list_devices();

    3. Stream receive: if the list parameter is not set, the application proceeds to receive stream.

      1. Mandatory Arguments Check.

        Copy
        Copied!
                    

        mandatory_args_set();

      2. CPU Affinity Mask (if it is set).

        Copy
        Copied!
                    

        doca_rmax_set_cpu_affinity_mask();

      3. Initializes the DOCA RMAX library.

        Copy
        Copied!
                    

        doca_rmax_init();

      4. Device opening.

        Copy
        Copied!
                    

        open_device();

      5. Global Resources Initialization.

        Copy
        Copied!
                    

        init_globals();

      6. Stream Initialization.

        Copy
        Copied!
                    

        init_stream();

    4. Main Loop.

      Copy
      Copied!
                  

      run_recv_loop();

    5. Clean-up.

      1. Cleans up and destroys the stream.

        Copy
        Copied!
                    

        destroy_stream();

      2. Releases and destroys global application resources.

        Copy
        Copied!
                    

        destroy_globals();

      3. Closes the device.

        Copy
        Copied!
                    

        doca_dev_close();

      4. Releases the DOCA RMAX library.

        Copy
        Copied!
                    

        doca_rmax_release();

      5. Destroys the ARGP resources.

        Copy
        Copied!
                    

        doca_argp_destroy();

      1. Releases resources allocated by the application configuration.

        Copy
        Copied!
                    

        destroy_config();

  • /opt/mellanox/doca/applications/stream_receive_perf/

  • /opt/mellanox/doca/applications/stream_receive_perf/stream_receive_perf_params.json

© Copyright 2025, NVIDIA. Last updated on May 5, 2025.