What can I help you with?
NVIDIA BlueField Platform Software Troubleshooting Guide

DOCA Flow

This page offers troubleshooting information for DOCA Flow users and customers.

DOCA Flow functions as an independent project but relies on DPDK. As a result, some issues users may encounter could originate from DPDK, accompanied by DPDK-specific error messages. For troubleshooting such errors, please consult the MLNX_DPDK Troubleshooting Guide.

To make the most of DOCA Flow, we recommend reviewing the following resources:

  1. DOCA Flow Programming Guide

    1. Begin with the "Introduction" then proceed through the "Steering Domains" section to gain foundational knowledge about DOCA Flow’s core concepts.

    2. Study the "Flow Life Cycle" section for an in-depth understanding of key operational stages, including:

      • Initialization

      • Pipe creation and entry insertion

      • Teardown

  2. NVIDIA DOCA Library APIs

    • Consult the API reference to select the appropriate DOCA Flow functions for your use case

  3. Sample Code

    • NVIDIA provides a comprehensive set of sample codes covering all released features which serves as a valuable resource for theoretical understanding or to use directly in related implementations.

  4. DOCA Flow Connection Tracking Programming Guide

    • This guide details the Connection Tracking functionality integrated within DOCA Flow, which is useful for applications requiring advanced session management

Command

Description

ibdev2netdev -v

Part of the OFED package. This command displays all associations between network devices and Remote Direct Memory Access (RDMA) adapter ports.

lspci

A Linux command that provides information about each PCI bus on your system.

ethtool

A Linux command used to query or control network driver and hardware settings.

ip,devlink

  • The ip command is used to assign addresses to network interfaces and configure network parameters on Linux systems. It replaces the outdated and deprecated ifconfig command on modern Linux distributions.

  • devlink is an API for exposing device information and resources not directly related to any specific device class, such as chip-wide or switch-ASIC-wide configurations.

echo -n $vf0_pci > /sys/bus/pci/drivers/mlx5_core/unbind

echo -n $vf1_pci > /sys/bus/pci/drivers/mlx5_core/unbind

devlink dev eswitch set pci/${pci_addr} mode switchdev

echo $vf_num > /sys/bus/pci/devices/${pci_addr}/sriov_numvfs

# Or

echo $vf_num >/sys/bus/pci/devices/$pci/mlx5_num_vfs

echo -n $vf0_pci > /sys/bus/pci/drivers/mlx5_core/bind

echo -n $vf1_pci > /sys/bus/pci/drivers/mlx5_core/bind

Sets switchdev mode with 2 VF's.

Note

The mlx5_num_vfs parameter is always present, regardless of whether the OS has loaded the virtualization module (such as when adding intel_iommu support to the grub file). In contrast, the sriov_numvfs parameter is applicable only if the intel_iommu has been added to the grub file. If you do not see the sriov_numvfs file, verify that intel_iommu was correctly added to the grub configuration.

<doca_flow_sample> --help

Displays help and options related to EAL. Useful for understanding how to pass device parameters.

<doca_flow_sample> -- --help

Shows the options available for the DOCA Flow application.

<doca_flow_sample> -- --log-level <N>

Sets the log level for the sample. DOCA allows fine control by providing separate logging paths for both the sample and the SDK.

<10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>

<doca_flow_sample> -- --sdk-log-level <N>

Sets the log level for the SDK. Similar to the sample's log control, this allows fine-tuned logging management for the SDK.

<10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>

  • DOCA Flow is an integral part of the broader DOCA ecosystem, offering both application-level and SDK-level logging, as outlined in the Debuggability section of the DOCA documentation.

  • DOCA Flow also provides a set of counters that can be attached to pipe or entry. For more information, refer to the Monitor API in the programming guide. If you are not sure where a packet is being dropped, using counters is an effective way to troubleshoot. Begin by setting up non-shared counters along the expected flow route.

Debug & Trace Features

The DOCA SDK development packages (doca-devel) include a developer-oriented package with additional trace and debug capabilities that are not part of the production libraries:

  • For .deb based systems: libdoca-sdk-flow-trace

  • For .rpm based systems: doca-sdk-flow-trace

These packages install the trace-versions of the libraries in the following directories:

  • .deb based systems: /opt/mellanox/doca/lib/<arch>/trace

  • .rpm based systems: /opt/mellanox/doca/lib64/trace

For detailed information on the additional capabilities included provided by these trace libraries and best practices for using them, refer to the corresponding section DOCA Flow's Programming guide. Links are provided in Preface.

Using a Custom DPDK

If a custom DPDK version is required, follow DPDK troubleshooting guideline (referenced in the Preface) to compile the project and install it either locally or system-wide.

Once compiled, make sure your PKG_CONFIG_PATH and LD_LIBRARY_PATH are set to point exclusively to the newly compiled DPDK. For example, on Ubuntu 22.04, configure these variables as follows:

Copy
Copied!
            

ARCH=`uname -m` export PKG_CONFIG_PATH=<DPDK_INSTALL_PATH>/lib/$ARCH-linux-gnu/pkgconfig export LD_LIBRARY_PATH=<DPDK_INSTALL_PATH>/lib/$ARCH-linux-gnu/

After setting the environment variables, reconfigure and recompile any DOCA Flow-related applications or samples to link them with the custom DPDK.

Functional Debugging with Scapy and Monitor

If packets are hitting the wrong pipe entries, try debugging the issue with a minimal load. Construct your packet using Scapy and send it from the port towards the host and device where the DOCA Flow is expected to be listening. Refer to Scapy Documentation for instructions on how to build and send packets.

To examine the returning packet, use Scapy again for Sniffing. This will allow you to verify whether the expected changes to the packt were made or not.

To trace which pipes and pipe entries the packet passed through the system, use the monitor feature to set up counter. After sending the traffic, query these counters and and print the statistics. It is important to include the monitor on the default entry as well for a complete view of the packet's path.

Performance Testing with TRex

T-Rex can be used as a traffic generator for performance testing with DOCA Flow. Install the appropriate version for your system from TRex website.

Follow the TRex Documentation to configure the server and client, and generate traffic according to your test requirements.

To measure packet rates or bandwidth, you can either retrieve the values from the TRex server, or use the mlnx_perf tool to view statistics on the physical. You can check the packets per second at the physical level by grepping for rx_packets_phy or tx_packets_phy .

Steering Dump Tool

To gain insight into the hardware structures created by DOCA Flow and their relations, use the mlx_steering_dump tool to parse and analyze hardware configurations.

Unclear How to Use a Feature

For every feature, small sample applications are available to demonstrate its functionality and usage. In DOCA Flow documentation, there is a dedicated samples section in the programming guide which can serve as a reference. You can either use these samples for copy-pasting code or as a trusted guide for your own experiments.

However, avoid following the samples blindly. It is best to read the sample documentation, where the key details about the feature usage are explained.

Error from EAL or mlnx5dr

DOCA Flow uses DPDK as a driver to interact with the hardware. If an error originates from EAL or mlnx5dr, refer to the MLNX_DPDK, as the issue might be within the a library that DOCA Flow depends on. The simplest example is not enough hugepages to start up the application.

DOCA Flow Error When Adding New Entry to Pipe

The error happens after trying to add new entry function. The error message would look similar to the following:

Copy
Copied!
            

mlx5_common: Failed to create TIR using DevX mlx5_net: Port 0 cannot create DevX TIR. [10:26:39:622581][DOCA][ERR][dpdk_engine]: create pipe entry fail on index:1, error=Port 0 create flow fail, type 1 message: cannot get hash queue, type=8 

The issue here seems to be caused by SF/ports configuration.

To fix the issue, apply the following commands on the BlueField:

Copy
Copied!
            

dpu# /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/0000:03:00.0 mode legacy dpu# /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/0000:03:00.1 mode legacy dpu# echo none > /sys/class/net/p0/compat/devlink/encap dpu# echo none > /sys/class/net/p1/compat/devlink/encap dpu# /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/0000:03:00.0 mode switchdev dpu# /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/0000:03:00.1 mode switchdeV


Match is Not Working - All Packets are Matched

Check the pipe’s match configuration. If both match and match_mask are provided, ensure the match_mask reflects the intended criteria. A typical mistake is providing a match_mask but leaving it unfilled, resulting in a zeroed match in the hardware rule. Set the match_mask to NULL or properly fill it in.

This behavior involves implicit and explicit types of matches, described in the documentation for Setting Pipe Matches.

Both UDP and TCP Packets are Matched, However, Only TCP was Intended

By default, DOCA Flow operates in Relaxed Match mode, and only doca_flow_parser_meta handles header-type matching. The doca_flow_match enums, such as match.outer.l4_type_ext, do not control header-type matches. These enums are simply selectors for how DOCA Flow should treat unions for multiple headers like TCP and UDP. Consult the DOCA Flow Parser Meta documentation for configuring the correct header-type match.

Match Structure is Configured to Match on Type, but Runtime Error Occurs

As mentioned earlier, header-type matching is controlled by doca_flow_parser_meta. Verify that your application code uses it. If you only configure type selectors, like match.outer.l4_type_ext within outer, inner, or tun, it may result in a runtime error due to no field being selected.

Control Pipe is Configured with Monitor, but Querying the Counter Returns an Error

For control pipes, counters are configured per entry. Ensure you create an empty control pipe and configure counters for each inserted entry.

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.