What can I help you with?
NVIDIA BlueField Platform Software Troubleshooting Guide

DOCA SDK

This guide is designed to assist developers, system administrators, and users in addressing common issues related to the DOCA SDK.

It offers comprehensive support for integrating the SDK into applications, troubleshooting development-related issues, and managing production deployment challenges.

The guide offers a curated collection of troubleshooting tips, solutions, and best practices for resolving DOCA SDK-related issues. Each section targets specific categories of problems, providing step-by-step instructions and explanations to help diagnose and resolve issues effectively.

The guide covers various topics, including:

  • Installation issues

  • Configuration challenges

  • Runtime errors

  • Performance optimizations

Additionally, it includes advice on debugging, logging, and monitoring to deepen understanding of the SDK's behavior and streamline the troubleshooting process.

Command

Description

meson [flags] <build directory>

  • Build a given meson project (e.g., DOCA SDK samples and applications) from the project's root directory (/opt/mellanox/doca/samples/<path-to-sample> and /opt/mellanox/doca/applications/ directories, respectively)

  • Build user code according to a meson.build file; linking it to DOCA SDK libraries

meson --reconfigure <build directory>

Reconfigure an existing build with different flags or updated dependencies

meson configure

Get current meson configuration or update a specific flag

ninja

ninja -C <build directory>

Compile a meson build directory, from within the created build directory.

Compile a given meson project by pointing to the created build directory using the -C flag.

DOCA provides logging functionality for both DOCA libraries and applications, allowing users to monitor and troubleshoot operations effectively.

Enabling Debug Messages

Enabling debug messages in DOCA can be highly beneficial for developing and debugging applications. To activate these debug messages, a logger backend must be created, and the verbosity level (defaulting to INFO) can be adjusted as needed.

Log Levels

Log messages from applications and libraries are assigned a severity level, which defines their importance. These levels, specified in doca_log.h, are ranked as follows (from highest to lowest importance):

  • Critical

  • Error

  • Warning

  • Information

  • Debug

  • Trace

During runtime, only messages of the configured severity level or higher are output to the logs, depending on the settings for each backend. Applications can configure this level by invoking the appropriate APIs from the logging library. Minimum and maximum verbosity levels can be set for each backend individually.

Backends

A logger backend is a component that effectively writes/displays a log.

The following backend types are supported:

  • File stream

  • File descriptor

  • Memory buffer

  • syslog

Every message is written to all the backends, according to the configured verbosity level in those backends. By default, no logger backend is defined, so no message is printed.

DOCA Messages

The default verbosity level of a newly created backend is INFO. This includes the information, warning and error messages that are printed by all the DOCA libraries.

Warning and error messages may indicate that there is a problem in the developed application, so it is important for users to pay attention to those messages.

In a good path:

  • DOCA functions which are considered part of control path will print debug messages to ease up the debugging

  • DOCA functions which are considered part of data path will not print any debug messages

In error flows:

  • Both control and data path functions will print an error message if an error was detected.

This section deals with troubleshooting issues related to compiling DOCA-based programs to use the DOCA SDK (e.g., missing dependencies).

Meson complains about missing dependencies

As part of DOCA's installation, a basic set of environment variables are defined so that projects (such as DOCA applications) could easily compile against the DOCA SDK, and to allow users easy access to the various DOCA tools. In addition, the set of DOCA applications sometimes rely on various 3rd party dependencies, some of which require specific environment variables so to be correctly found by the compilation environment (meson).

Error

There are multiple forms this error may appear in, such as:

  • DOCA libraries are missing:

    Copy
    Copied!
                

    Run-time dependency doca-common found: NO (tried pkgconfig)   meson.build:230:0: ERROR: Dependency "doca-common" not found, tried pkgconfig

  • DPDK definitions are missing:

    Copy
    Copied!
                

    Dependency libdpdk found: NO (tried pkgconfig and cmake) meson.build:41:1: ERROR: Dependency "libdpdk" not found, tried pkgconfig and cmake

  • mpicc is missing for DPA All to All application:

    Copy
    Copied!
                

    ==================== Skipped Applications ==================== * dpa_all_to_all: Missing mpicc

Solution

All the dependencies mentioned above are installed as part of DOCA's installation, and yet it is recommended to check that the packages themselves were installed correctly. The packages that install each dependency define the environment variables needed by it, and apply these settings per user login session:

  • If DOCA was just installed (on the host or DPU), user session restart is required to apply these definitions (i.e., log off and log in).

  • It is important to compile DOCA using the same logged in user. Logging as ubuntu and using sudo su, or compiling using sudo, will not work.

If restarting the user session is not possible (e.g., automated non-interactive session), the following is a list of the needed environment variables:

Note

All the following examples use the required environment variables for the DPU. For the host, the values should be adjusted accordingly (aarch64 is for the DPU and x86 is for the host): aarch64-linux-gnu → x86_64-linux-gnu.

Tip

It is recommended to define all of the following settings so as to not have to remember which DOCA application requires which module (whether DPDK, FlexIO, etc).

DOCA Tools:

  • For Ubuntu:

    Copy
    Copied!
                

    export PATH=${PATH}:/opt/mellanox/doca/tools

  • For CentOS:

    Copy
    Copied!
                

    export PATH=${PATH}:/opt/mellanox/doca/tools

DOCA Applications:

  • For Ubuntu and CentOS

    Copy
    Copied!
                

    export PATH=${PATH}:/usr/mpi/gcc/openmpi-4.1.7a1/bin export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/mpi/gcc/openmpi-4.1.7a1/lib

DPDK:

  • For Ubuntu:

    Copy
    Copied!
                

    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig

  • For CentOS:

    Copy
    Copied!
                

    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/opt/mellanox/dpdk/lib64/pkgconfig

FlexIO:

  • For Ubuntu:

    Copy
    Copied!
                

    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/opt/mellanox/flexio/lib/pkgconfig

  • For CentOS:

    Copy
    Copied!
                

    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/opt/mellanox/flexio/lib/pkgconfig

CollectX:

  • For Ubuntu and CentOS:

    Copy
    Copied!
                

    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/opt/mellanox/collectx/lib/aarch64-linux-gnu/pkgconfig

Meson complains about permissions

Our guides for compiling the reference samples and applications of DOCA's SDK are using the meson build system.

Error

A permission error is encountered when trying to reuse a build directory from a previous build:

Copy
Copied!
            

ubuntu@localhost:/opt/mellanox/doca/samples/doca_flow/flow_acl$ meson /tmp/build Traceback (most recent call last): File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 146, in run return options.run_func(options) File "/usr/lib/python3/dist-packages/mesonbuild/msetup.py", line 294, in run app.generate() File "/usr/lib/python3/dist-packages/mesonbuild/msetup.py", line 181, in generate mlog.initialize(env.get_log_dir(), self.options.fatal_warnings) File "/usr/lib/python3/dist-packages/mesonbuild/mlog.py", line 103, in initialize log_file = open(os.path.join(logdir, log_fname), 'w', encoding='utf-8') PermissionError: [Errno 13] Permission denied: '/tmp/build/meson-logs/meson-log.txt'


Solution

Per the meson build instructions, the user can choose any write-accessible directory to be used as the build directory, using the following syntax:

Copy
Copied!
            

meson <build-dir>

When reusing a build directory, it is best to ensure that the existing directory was created by a user with the same permissions, and only then do one of the following:

  • Removing the old build directory:

    Copy
    Copied!
                

    rm -rf /tmp/build

  • Reconfiguring the build directory:

    Copy
    Copied!
                

    meson --reconfigure /tmp/build

The above error is an indication that the build directory was created by a different user, and that our user doesn't have permissions to use it. In such cases, it is best to choose a different build directory, in a directory that our user has write-access to. For example:

Copy
Copied!
            

meson /tmp/build2

Static compilation on CentOS – undefined references to C++

When statically compiling against the DOCA SDK on RHEL 7.x machines, there could be a conflict between the libstdc++ version available out-of-the-box and the one used when building DOCA's SDK libraries.

Error

There are multiple forms this error may appear in, such as:

Copy
Copied!
            

$ cc test.o -o test_out `pkg-config --libs --static doca` /opt/mellanox/doca/lib64/libdoca_common.a(doca_common_core_src_doca_dev.cpp.o): In function `doca_devinfo_rep_list_create': (.text.experimental+0x2193): undefined reference to `__cxa_throw_bad_array_new_length' /opt/mellanox/doca/lib64/libdoca_common.a(doca_common_core_src_doca_dev.cpp.o): In function `doca_devinfo_rep_list_create': (.text.experimental+0x2198): undefined reference to `__cxa_throw_bad_array_new_length' collect2: error: ld returned 1 exit status


Solution

Upgrading the devtoolset on the machine to the one used when building the DOCA SDK resolves the undefined references issue:

Copy
Copied!
            

$ sudo yum install epel-release $ sudo yum install centos-release-scl-rh $ sudo yum install devtoolset-8 # This will enable the use of devtoolset-8 to the *current* bash session $ source /opt/rh/devtoolset-8/enable

Static compilation on CentOS – unresolved symbols

When statically compiling against the DOCA SDK on RHEL 7.x machines, a known issue in the default pkg-config version (0.27) causes a linking error.

Error

There are multiple forms this error may appear in. For example:

Copy
Copied!
            

$ cc test.o -o test_out 'pkg-config --libs --static doca' ... /opt/mellanox/dpdk/lib64/librte_net_mlx5.a(net_mlx5_mlx5_sft.c.o): In function 'mlx5_sft_start': mlx5_sft.c:(.text+0x1827): undefined reference to 'mlx5_malloc' ...


Solution

Use an updated version of pkg-config or pkgconf instead when building applications (as is recommended in DPDK's compilation instructions).

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.