MLNX_DPDK
This page offers troubleshooting information for DPDK users and customers. It is advisable to review the following resources:
Command | Description |
| Part of the OFED package, this command displays all associations between network devices and RDMA adapter ports, providing detailed information for troubleshooting and configuration. |
| A Linux command that provides detailed information about each PCIe bus and the devices connected to them. This tool is essential for identifying PCIe devices and troubleshooting hardware configurations. |
| A Linux command used to query or modify network driver and hardware settings. This tool is commonly utilized for managing Ethernet device configurations, such as speed, duplex mode, and auto-negotiation, as well as for gathering detailed diagnostic information. |
| A Linux command used to assign addresses to network interfaces and configure various network parameters. It is the modern replacement for the deprecated |
| An API and CLI tool for exposing and managing device-wide information and resources that are not specific to any particular device class. This includes configurations and attributes such as chip-wide or switch-ASIC-wide settings. |
| This command builds the Data Plane Development Kit (DPDK) with support for the MLX5 driver, along with additional specified drivers and components (e.g., memory pools and auxiliary buses).
|
Or:
| Sets switchdev mode with 2 VFs. Note
The |
Logging – for information on logging, refer to the DPDK Log Library Guide
Trace – for information on tracing, refer to the DPDK Trace Library Guide
Counters – for information on counters, refer to the Ethtool Counters Guide
Compilation Debug Flags
Debug Flag | Description |
| Sets the build type as debug, which enables debugging information and disables optimizations |
| Activates assertion checks and enables debug messages for MLX5 |
Steering Dump Tool
The steering dump tool allows the application to dump its specific data and triggers the hardware to dump the associated hardware data. For additional details, refer to mlx_steering_dump.
dpdk-proc-info Application
The dpdk-proc-info
application functions as a secondary process within the DPDK environment and provides the following capabilities:
Retrieve port statistics
Reset port statistics
Print DPDK memory information
Display debug information for ports
For more information, refer to the Dpdk-proc-info Application Guide.
Reproducing an Issue with testpmd Application
The testpmd
application can be used to replicate issues in simplified scenarios. For detailed instructions, refer to the Testpmd Application User Guide.
DPDK Unit Test Tool
The DPDK Unit Test Tool facilitates DPDK unit testing using the testpmd
and Scapy applications. For additional details, refer to the DPDK Unit Test Tool.
Memory Errors
No Free Hugepages Reported
If the following memory error is encountered during the startup of testpmd
:
EAL: No free 2048
kB hugepages reported on node 0
EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
This error indicates that no hugepages have been allocated. To configure hugepages, run:
sysctl vm.nr_hugepages=<#huge_pages>
Insufficient Memory for MLX5 Hash List Creation
If the following memory error is encountered during the startup of testpmd
:
mlx5_common: mlx5_common_utils.c:420
: mlx5_hlist_create(): No memory for
hash list mlx5_0_flow_groups creation
This error suggests insufficient free hugepages. Verify the availability of free hugepages by running:
grep -i hugepages /proc/meminfo
Compatibility Issues Between Firmware and Driver
Symptom: T he mlx5 driver is not loading.
Resolution :
This issue may be caused by a compatibility mismatch between the firmware and the driver. When this occurs, the driver fails to load and an error message is logged in the dmesg
output.
To resolve this issue verify that the firmware version is compatible with the driver version in use. Refer to the MLNX_OFED Documentation for detailed information on supported firmware and driver versions.
Restarting the Driver After Removing a Physical Port
Symptom: Operational issues occur due to a physical port being removed from an OVS-DPDK bridge while offload is enabled.
Resolution : W hen offload is enabled, removing a physical port from an OVS-DPDK bridge requires restarting the OVS service. Failing to restart the service can cause incorrect datapath rule configurations.
To resolve this issue:
Reattach the physical port to the bridge according to the desired topology.
Restart the Open vSwitch (OVS) service to restore proper functionality.
Dec_ttl Feature Not Working
dec_ttl
is only supported on NVIDIA® ConnectX®-6 adapters and higher.
Deadlock When Moving to switchdev Mode
Symptom: A deadlock occurs when transitioning to switchdev mode while deleting a namespace.
Resolution :
To prevent this issue, unload the mlx5_ib
module before switching to switchdev mode.
Unusable System After Unloading the mlx5_core Driver
Symptom:
System becomes unresponsive after unloading the mlx5_core
driver while running from a network boot and using a ConnectX adapter to connect to network storage.
Resolution :
Unloading the
mlx5_core
driver (e.g., by running /etc/init.d/openibd restart
) in this scenario causes system instability. This scenario should simply be avoided as there is no resolution to it.
Incompatibility Between RHEL 7.6alt and CentOS 7.6alt Kernels
Symptom: Attempting to install MLNX_OFED on a system with the CentOS 7.6alt kernel results in some kernel modules built for RHEL 7.6alt failing to load.
Resolution : The kernel used in CentOS 7.6alt (for non-x86 architectures) differs from the kernel in RHEL 7.6alt. Consequently, MLNX_OFED kernel modules compiled for the RHEL 7.6alt kernel may not load on a CentOS 7.6alt system. To resolve this issue, rebuild the kernel modules specifically for the CentOS 7.6alt kernel.
Cannot Add VF 0 Representor
Symptom: The following error is encountered when attempting to add VF 0 representor:
mlx5_pci port query failed: Input/output error
Resolution : Ensure that the VF configuration is fully completed before starting the DPDK application.
EAL Initialization Failure
EAL initialization failure is a common error that may occur when running various DPDK-related applications.
The error typically appears as follows:
[DOCA][ERR][NUTILS]: EAL initialization failed
This error may result from several issues, including:
The application requires huge pages, but none have been allocated
The application requires root privileges to run but was executed without elevated privileges
To address this issue, follow either of the following solutions which correspond to the previously listed potential causes:
Run the following commands on the host or BlueField, depending on where the application is being executed:
$ echo
'2048'
| sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/hugeExecute the application using
sudo
or as root user:sudo
<run_command>
DPDK EAL Limitation with More than 128 Cores
Symptom: When running with 190 cores, DPDK detects only 128 cores, and the following messages appear:
dpdk/bin/dpdk-test-compress-perf -a 0000
:11
:00.0
,class
=compress -l 0
,190
-- ...
EAL: Detected 128
lcore(s)
EAL: Detected 4
NUMA nodes
EAL: invalid core list syntax
Resolution :
To resolve this issue, compile DPDK with the parameter -Dmax_lcores=256
. This allows DPDK to recognize additional cores beyond the default limit of 128.
Packets Dropped When Transferring to UDP Port 4789
Symptom : While running testpmd
, the following flow rules are created:
flow create 1 priority 1 transfer ingress group 0 pattern eth / vlan / ipv4 / udp dst is 4789 / vxlan / end actions jump group 820 / end
flow create 1 priority 0 transfer ingress group 820 pattern eth / vlan / ipv4 / udp dst spec 4789 dst mask 0xfff0 / vxlan / eth / vlan / ipv4 / end actions modify_field op
set
dst_type udp_port_dst src_type value src_value 0x168c9e8d4df8f width 2 / port_id id
0 / end
When sending a packet that matches these rules, the packet is dropped and is not received on the port 0 peer.
Resolution : UDP port 4791 is reserved for RDMA over Converged Ethernet (RoCE) traffic. By default, RoCE is enabled on all mlx5
devices, and traffic to UDP port 4791 is treated as RoCE traffic.
To forward traffic to this port for Ethernet use (without RDMA), disable RoCE using the following command:
echo
<0|1> > /sys/devices/{pci-bus-address}/roce_enable
Refer to the MLNX_OFED Documentation for more details on disabling RoCE.
Ping Loss with 254 vDPA Devices
Symptom: Intermittent ping loss or errors are observed on some interfaces when the following is performed:
Create 127 Virtual Functions (VFs) on each port.
Configure VF link aggregation (LAG).
Start the vDPA example with 254 ports.
Launch 16 virtual machines (VMs), each with up to 16 interfaces.
Send pings from each vDPA interface to an external host.
Resolution :
This issue can be resolved by adding the runtime configuration parameter event_core=x
, where x
specifies the CPU core number to be used for the timer thread. By default, the event_core
parameter uses the EAL main lcore.
The event_core
parameter can be shared among different mlx5
vDPA devices. However, allocating this core to additional tasks may negatively impact the performance and latency of the mlx5
vDPA devices.
No TX Traffic with testpmd and HWS dv_flow=2
Symptom:
Packets are not forwarded in forward mode when running testpmd
with the following command:
dpdk-testpmd -n 4 -a 08:00.0,representor=0,dv_flow_en=2 -- -i --forward-mode=txonly -a
Resolution : To resolve this issue, ensure that hardware steering (HWS) queues are properly configured for all ports, including the representor ports (representor=0-1
) and the PFs. This configuration is required to establish the default flows necessary for traffic forwarding.
OVS-DPDK - Duplicate Packets When Co-running with SPDK
Symptom: Duplicate packets are observed when running OVS-DPDK with offload alongside SPDK, both of which attach to the same PF.
Resolution :
To address this issue, set the parameter dv_esw_en=0
on the OVS-DPDK side. This disables E-Switch using Direct Rules, which is enabled by default if supported.
Live Migration VM Stuck in PMD MAC Swap Mode
Symptom: Running the following scenario results in an endless loop during live migration:
Open 64 VFs.
Configure OVS-DPDK [VF].
Start a VM with 16 devices.
Initiate traffic between VMs in PMD MAC swap mode.
Attempt live migration on Hyper-V with the following command:
virsh migrate --live --unsafe --persistent --verbose qa-r-vrt-123-009-CentOS-8.2 qemu+ssh
://qa-r-vrt-125/system tcp://22.22.22.125
The migration process stalls with a repeating status:
Migration: [ 75
%] // repeats in a loop
Resolution :
To address this known issue, use the auto-converge
parameter to handle heavy vCPU or traffic loads during live migration, as migration may not complete otherwise. Run the following command:
virsh migrate ... --auto-converge --auto-converge-initial 60 --auto-converge-increment 20
Failure to Create Rule with Raw Encapsulation in Switchdev Mode
Symptom : An error occurs when attempting to create a rule with encapsulation on port 0. The following messages are displayed:
mlx5_net: [mlx5dr_action_create_reformat_root]: Failed to create dv_create_flow reformat
mlx5_net: [mlx5dr_action_create_reformat]: Failed to create root reformat action
Template table #0
destroyed
port_flow_complain(): Caught PMD error type 1
(cause unspecified): fail to create rte table: Operation not supported
Resolution : To resolve this issue:
Disable encapsulation with the following command:
echo none > /sys/
class
/net/<ethx>/compat/devlink/encapIf tunnels must be offloaded in virtual functions (VFs) or scalable-functions (SFs), ensure encapsulation is disabled in the forwarding database (FDB) domain. This is necessary because the NIC does not support encapsulation in both domains simultaneously.
Unable to Create Flow When Having L3 VXLAN With External Process
Symptom: After detaching and reattaching ports, an error is encountered when attempting to create a flow that matches on L3 VXLAN.
Example scenario:
Detach ports:
port stop all device detach 0000:08:00.0 device detach 0000:08:00.1
Re-attach ports:
mlx5 port attach 0000:08:00.0 socket=/var/run/external_ipc_socket mlx5 port attach 0000:08:00.1 socket=/var/run/external_ipc_socket port start all
Create a rule:
testpmd> flow create 1 priority 2 ingress group 0 pattern eth dst is 00:16:3e:23:7c:0c has_vlan spec 1 has_vlan mask 1 / vlan vid is 1357 ... / ipv6 src is ::9247 ... / udp dst is 4790 / vxlan-gpe / end actions queue index 65000 / end
Error printed:
port_flow_complain(): Caught PMD error
type
13 (specific pattern item): cause: 0x7ffd041c6be0, L3 VXLAN is not enabled by device parameter and/or not configuredin
firmware: Operation not supported
Resolution : The error occurs because the l3_vxlan_en
parameter is not set when attaching the device. This parameter is required to enable L3 VXLAN and VXLAN-GPE flow creation.
To address this issue:
Enable the
l3_vxlan_en
parameter – ensure that thel3_vxlan_en
parameter is set to a nonzero value when attaching the device. This enables the creation of L3 VXLAN and VXLAN-GPE flows.Configure firmware support – verify that the firmware is configured to support L3 VXLAN or VXLAN-GPE traffic. By default, this parameter is disabled and must be explicitly enabled.
Verify configurations – confirm that both the device parameters and firmware configurations are in place to handle L3 VXLAN traffic successfully.
Buffer Split - Failure to Configure max-pkt-len
Symptom : When running testpmd
and attempting to change the max-pkt-len
configuration, an error occurs.
Command:
dpdk-testpmd -n 4 -a 0000:00:07.0,dv_flow_en=1,... -a 0000:00:08.0,dv_flow_en=1,... -- --mbcache=512 -i --nb-cores=15 --txd=8192 --rxd=8192 --burst=64 --mbuf-size=177,430,417 --
enable
-scatter --tx-offloads=0x8000 --mask-event=intr_lsc port stop all port config all max-pkt-len 9216 port start allError output:
Configuring Port
0
(socket0
): mlx5_net: port0
too many SGEs (33
) needed to handle requested maximum packet size9216
, the maximum supported are32
mlx5_net: port0
unable to allocate queue index0
Fail to configure port0
rx queues
Resolution : To resolve this issue:
Enable buffer split – start
testpmd
with therx-offloads=0x102000
parameter to enable buffer split.Configure receive packet size – use the
set rxpkts (x[,y]*)
command to configure the receive packet size, where:x[,y]*
is a comma-separated list of values.A zero value indicates using the memory pool data buffer size.
Example:
set
rxpkts 49,430,417
Unable to Start testpmd with Shared Library
Symptom : An error occurs when attempting to run testpmd
from the download folder:
Command:
/tmp/dpdk/build-meson/app/dpdk-testpmd -n 4 -w ...
Error output:
EAL: Error, directory path /tmp is world-writable and insecure EAL: FATAL: Cannot init plugins EAL: Cannot init plugins EAL: Error - exiting with code:
1
Cause: Cannot init EAL: Invalid argument
Resolution : The error occurs because the directory where DPDK is located (e.g., /tmp
) has improper permissions, making it world-writable and insecure. To resolve this issue:
Rebuild DPDK in a directory with stricter permissions.
Ensure that the directory is not world-writable to meet the security requirements of DPDK.
Cross-Port Action: Failure to Create Actions Template
Symptom: When attempting to configure two ports, where port 1 is set as the host port for port 0, an error occurs during the creation of the action template on port 0.
Reproduction steps:
Run testpmd:
/download/dpdk/install/bin/dpdk-testpmd -n 4 -a 0000:08:00.0,dv_flow_en=2,dv_xmeta_en=0 -a 0000:08:00.1,dv_flow_en=2,dv_xmeta_en=0 --iova-mode=
"va"
-- --mbcache=512 -i --nb-cores=7 --rxq=8 --txq=8 --txd=2048 --rxd=2048Stop all ports:
port stop all
Configure flows
flow configure 0 queues_number 16 queues_size 256 counters_number 0 host_port 1 flags 2 flow configure 1 queues_number 16 queues_size 256 counters_number 8799
Start all ports:
port start all
Error output:
flow queue 1 indirect_action 5 create postpone
false
action_id 3 ingress action count / end flow push 1 queue 5 flow pull 1 queue 5 flow actions_template 0 create actions_template_id 8 template shared_indirect 1 3 / end mask count / end Actions template#8 destroyed
port_flow_complain(): Caught PMD errortype
16 (specific action): cause: 0x7ffd5610a210, counters pool not initialized: Invalid argument
Resolution : To resolve this issue, configure the ports in the correct order, with the host port configured first.
Use the following updated sequence:
Stop all ports:
port stop all
Configure port 1 (host port):
flow configure 1 queues_number 16 queues_size 256 counters_number 8799
Configure port 0:
flow configure 0 queues_number 16 queues_size 256 counters_number 0 host_port 1 flags 2
Start all ports:
port start all
RSS Hashing Issue with GRE Traffic
Symptom : GRE traffic is not being RSS hashed to multiple cores when received on a ConnectX-6 interface. However, the same traffic is correctly hashed on an Intel i40e interface.
Resolution : To enable RSS hashing for GRE traffic over inner headers, create a flow rule that specifically matches tunneled packets. Flow rule example:
Pattern:
ETH / IPV4 / GRE / END
Actions:
RSS(level=2, types=…) / END
If you attempt to use RSS level 2 in a flow rule without including a tunnel header item, you will encounter the following error:
inner RSS is not supported for
non-tunnel flows
Unable to Probe SF/VF Device with crypto-perf-test App on Arm
Symptom: An error occurs when attempting to probe an SF on startup using the crypto-perf-test
application on an Arm host with the following command:
dpdk-test
-crypto-perf -c 0x7ff -a auxiliary:mlx5_core.sf.1,class=crypto,algo=1 -- --ptest verify --aead-op
decrypt --optype aead --aead-algo aes-gcm ....
Error output:
No crypto devices type
mlx5_pci available
USER1: Failed to initialise requested crypto device type
When attempting to use the representor
keyword to probe SF or VF, the following error is received:
mlx5_common: Key "representor"
is unknown for
the provided classes.
Resolution : On Arm platforms:
Only VF/SF representors are available, not the actual PCIe devices
Probing with the
representor
keyword or using real PCIe device types is not supported
Ensure configurations are aligned with the platform's limitations to avoid these issues.
Failed to Create ESP Matcher on VFs in Template Mode
Symptom : An error occurs when attempting to create a pattern template on a VF using testpmd
with the following commands
:
dpdk-testpmd -n 4 -a 0000:08:00.2,...,dv_flow_en=2,dv_xmeta_en=0 -a 0000:08:00.4,...,dv_flow_en=2,... -- --mbcache=512 -i --nb-cores=7 ...
port stop all
flow configure 0 queues_number 7 queues_size 256 meters_number 0 counters_number 0 quotas_number 256
flow configure 1 queues_number 14 queues_size 256 meters_number 0 counters_number 7031 quotas_number 64
port start all
flow pattern_template 0 create ingress pattern_template_id 0 relaxed no template eth / ipv4 / esp / end
flow actions_template 0 create actions_template_id 0 template jump group 88 / end mask jump group 88 / end
flow template_table 0 create table_id 0 group 72 priority 0 ingress rules_number 64 pattern_template 0 actions_template 0
Error output:
mlx5_net: [mlx5dr_definer_conv_items_to_hl]: Failed processing item type
: 23
mlx5_net: [mlx5dr_definer_calc_layout]: Failed to convert items to header layout
mlx5_net: [mlx5dr_definer_matcher_init]: Failed to calculate matcher definer layout
mlx5_net: [mlx5dr_matcher_bind_mt]: Failed to set
matcher templates with match definers
mlx5_net: [mlx5dr_matcher_create]: Failed to initialise matcher: 95
Resolution : Currently, IPsec offload can only be supported on a single path—either on a PF, VF, or e-switch. However:
DPDK does not yet support IPsec offload on VFs
IPsec is limited to configurations involving PFs or e-switches
As a result, configuring IPsec on VFs is not supported at this time.
Failure to Receive Hairpin Traffic (HWS) Between Two Physical Ports
Symptom :
Traffic is not received on port 1 when running testpmd
in hairpin mode with the following setup:
Port 0 – configured to create simple rules and send traffic.
Port 1 – monitored to measure received traffic. However, no traffic is observed.
Setup:
Devices – two BlueField-2 devices (DUT and TG) connected back-to-back via symmetrical interfaces (p0 and p1)
Firmware – configured for HWS
Steps:
Run DUT in hairpin mode:
dpdk-testpmd -c 0xff -n 4 -a 0000:03:00.0,dv_flow_en=2 -a 0000:03:00.1,dv_flow_en=2 ... --forward-mode=rxonly -i --hairpinq 1 --hairpin-mode=0x12
Configure DUT:
port stop all flow configure 0 queues_number 4 queues_size 64 port start all flow pattern_template 0 create pattern_template_id 0 ingress template eth / end flow actions_template 0 create actions_template_id 0 template jump group 1 / end mask jump group 1 / end flow template_table 0 create table_id 0 group 0 priority 0 ingress rules_number 64 pattern_template 0 actions_template 0 flow queue 0 create 0 template_table 0 pattern_template 0 actions_template 0 postpone no pattern eth / end actions jump group 1 / end flow pull 0 queue 0 flow pattern_template 0 create pattern_template_id 2 ingress template eth / end flow actions_template 0 create actions_template_id 2 template queue / end mask queue / end flow template_table 0 create table_id 2 group 1 priority 0 ingress rules_number 64 pattern_template 2 actions_template 2 flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone no pattern eth / end actions queue index 4 / end flow pull 0 queue 0 start
Send traffic from TG on port 0:
/tmp/dpdk-hws/app/dpdk-testpmd -c 0xff -n 4 -a 0000:03:00.0,dv_flow_en=1 --socket-mem=2048 -- --port-numa-config=0 --socket-num=0 --burst=64 --txd=1024 --rxd=1024 --mbcache=512 --rxq=4 --txq=4 --nb-cores=1 --forward-mode=txonly --no-lsc-interrupt -i
Measure traffic rate on TG interface p1:
mlnx_perf -i p1
Expected result: 2.5 Gb/s
Actual result: None
Resolution : The absence of traffic on port 1 is likely due to improper HWS configuration on port 1. Without correct HWS settings, default SQ miss rules are not inserted, preventing traffic reception.
Ensure that both ports (Port 0 and Port 1) are correctly configured with HWS queues. Update the configuration as follows:
port stop all
flow configure 0 queues_number 4 queues_size 64
flow configure 1 queues_number 4 queues_size 64
port start all
This ensures proper HWS queue setup and insertion of default SQ miss rules, allowing traffic to flow as expected.
DPDK-OVS Memory Allocation Error
Symptom : During initialization, the following error is encountered:
dpdk|ERR|EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
Resolution : Recent updates to OVS and DPDK have changed the default behavior of memory configuration arguments:
The EAL argument
--socket-mem
is no longer configured by default during start-up. Ifdpdk-socket-mem
anddpdk-alloc-mem
are not explicitly set, DPDK will use its default settings.The EAL argument
--socket-limit
no longer defaults to the value of--socket-mem
.
To avoid this error, update your memory configuration as follows:
Explicitly set the
dpdk-socket-mem
parameter to specify the desired memory allocation.Set
other_config:dpdk-socket-limit
to the same value asother_config:dpdk-socket-mem
to maintain the previous behavior and ensure memory-limiting consistency.
Example configuration:
other_config:dpdk-socket-mem=2048
,0
other_config:dpdk-socket-limit=2048
,0
High Latency in VDPA Ping Traffic Between VMs with 240 SFs
Symptom : Ping traffic between virtual machines (VMs) exhibits high latency when configured with 240 SFs.
Resolution :
To reduce latency in this scenario, configure the system with event-mode=2
.
event-mode=2
, also known as Event Forward Mode, allows DPDK applications to offload DMA operations to a DMA adapter while preserving the correct order of ingress packets. This configuration is designed to optimize performance and improve latency in environments with high numbers of SFs.
testpmd Startup Failure in ConnectX-6 Dx KVM Setup
Symptom : When running testpmd
with the parameter tx_pp=500
, the application fails to start and exits with the following error:
dpdk-testpmd -n 4 -w 0000:00:07.0,l3_vxlan_en=1,tx_pp=500,dv_flow_en=1 -w ...
Error output:
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:00:07.0 (socket 0)
mlx5_net: WQE rate mode is required for
packet pacing
mlx5_net: probe of PCI device 0000:00:07.0 aborted after encountering an error: No such device
Resolution : The issue occurs because the REAL_TIME_CLOCK_ENABLE
parameter is not configured. This parameter is required for packet pacing functionality on NVIDIA ConnectX adapters.
Ensure that the REAL_TIME_CLOCK_ENABLE
parameter in mlxconfig
is set to 1
. This activates the real-time timestamp format on NVIDIA ConnectX adapters, providing timestamps relative to the Unix epoch, which are necessary for packet pacing.
Configuration command:
mlxconfig -d <device_id> set
REAL_TIME_CLOCK_ENABLE=1
Verify the parameter is enabled and re-run the testpmd
application.
TCP Hardware Hairpin Connection Forwarding Packets to Host
Symptom : After passing through the Connection Tracking (CT) check, certain TCP packets exhibit the following behavior:
They do not have the
RTE_FLOW_CONNTRACK_PKT_STATE_VALID
orRTE_FLOW_CONNTRACK_PKT_STATE_CHANGED
flags set.They do not have the
RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED
flag set either.Despite this, these packets are being forwarded to the host.
All CT objects are created with liberal mode set to 1
.
Resolution : To resolve this issue, adjust the configuration of the conntrack
object by following one of these approaches
:
Set
max_win
to 0 for both the original and reply directionsConfigure
max_win
with the appropriate values for both directions, as recommended in the release notes and header files
OVS-DPDK LAG - Configuration Mismatch for dv_xmeta_en
Symptom : When adding vDPA ports to OVS from port1 and port2, an error is encountered in the OVS log.
Setup used:
dpdk-extra="-w 0000:86:00.0,representor=pf0vf[0-15],dv_xmeta_en=1,dv_flow_en=1,dv_esw_en=1 -w 0000:86:00.0,representor=pf1vf[0-15],dv_xmeta_en=1,dv_flow_en=1,dv_esw_en=1"
dpdk-init="true"
dpdk-socket-mem="8192,8192"
hw-offload="true"
Error printed in OVS log:
Jul 04
12
:24
:24
qa-r-vrt-123
ovs-vsctl[23091
]: ovs|00001
|vsctl|INFO|Called as ovs-vsctl add-port br0-ovs vdpa16 -- set Interface vdpa16 type=dpdkvdpa options:vdpa-socket-path=/tmp/sock16 options:vdpa-accelerator-devargs=0000
:86
:08.2
options:dpdk-devargs=0000
:86
:00.0
,representor=pf1vf[0
]
Jul 04
12
:24
:24
qa-r-vrt-123
ovs-vswitchd[22545
]: ovs|00570
|dpdk|ERR|mlx5_net: "dv_xmeta_en"
configuration mismatch for
shared mlx5_bond_0 context
The issue does not occur if dv_xmeta_en=1
is removed during initialization.
Resolution :
The problem arises because DPDK does not probe the same PCIe address with different devargs
settings. Consequently:
The representors of PF1 are ignored
OVS treats these representors as new ports and probes them independently, resulting in a configuration mismatch
To resolve the issue, configure both PF representors using a single devargs
entry (e.g., representor=pf[0-1]vf[0-15]
). This configuration ensures that both PF representors are correctly identified and probed without causing a mismatch in the dv_xmeta_en
configuration.
Issue Binding Hairpin Queues After Enabling Port Loopback
Symptom : When running testpmd
in hairpin mode, the system fails to bind hairpin queues after enabling loopback mode on a port.
Configuration steps :
testpmd> port stop 1
testpmd> port config 1
loopback 1
testpmd> port start 1
Resolution : To resolve this issue, ensure that all ports are stopped before applying the loopback configuration. Use the following steps:
Stop all ports:
port stop all
Configure loopback mode on the desired port:
port config
1
loopback1
Restart all ports:
port start all
By stopping all ports before applying the loopback configuration, the system can correctly bind hairpin queues.
Memory Allocation Failure
Symptom : When running testpmd
with iova-mode=pa
, initialization fails with the following errors:
EAL: Selected IOVA mode 'PA'
EAL: No
available 1048576 kB hugepages reported
Fail to
start port 0: Cannot allocate memory
Fail to
start port 1: Cannot allocate memory
Resolution : The memory fragmentation issue can be resolved by using iova-mode=va
instead. This mode utilizes virtual addressing, which can handle memory fragmentation more effectively.
VDPA Rx Packet Truncation with --enable-scatter
Symptom :
Packets are truncated when configuring the maximum packet length with the following testpmd
settings:
dpdk-testpmd -n 4 -w 0000:04:00.0,representor=[0,1],dv_xmeta_en=0,txq_inline=290,rx_vec_en=1,l3_vxlan_en=1,dv_flow_en=1,dv_esw_en=1 -- .. --enable
-scatter ..
After changing the MTU and maximum packet length:
port stop all
port config all max-pkt-len 8192
port start all
port config mtu 0 8192
port config mtu 1 8192
port config mtu 2 8192
Resolution : The packets are truncated to the default size of 1518 bytes. This issue is typically caused by the guest VM's MTU settings rather than the host configuration .
To resolve this issue:
Set the MTU value using the
host_mtu
parameter when launching the VM:-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,mrg_rxbuf=on,host_mtu=9000
This sets the MTU for the virtio-net device to 9000 bytes.
Inside the guest VM, verify the MTU value using the
ifconfig
command to ensure it matches the specified value (e.g., 9000 bytes).If a different MTU value is required, adjust both the
--max-pkt-len
parameter in thetestpmd
command on the host and thehost_mtu
parameter in the QEMU command for the guest to match the desired MTU.
Ring Memory Issue
Symptom : The following memory error is encountered:
RING: Cannot reserve memory
[13:00:57:290147][DOCA][ERR][UFLTR::Core:156]: DPI init failed
Resolution : This is a common memory issue when running the application on the host, often caused by insufficient memory (i.e., not enough huge pages allocated per worker thread).
Possible solutions:
Recommended – increase the amount of allocated huge pages. Refer to the relevant documentation for instructions on allocating huge pages.
Alternative solution – limit the number of cores used by the application. Use one of the following options:
-c <core-mask>
– set the hexadecimal bitmask of the cores to run on-l <core-list>
– list of cores to run onExample command:
./doca_<app_name> -a 3b:00.3 -a 3b:00.4 -l 0-64 -- -l 60
DOCA Apps Using DPDK in Parallel Issue
Symptom : When running two DOCA applications in parallel that use DPDK, the first application runs successfully, but the second fails with the following error:
Failed to start URL Filter with output: EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: RTE Version: 'MLNX_DPDK 20.11.4.0.3' EAL: Detected shared linkage of DPDK
EAL: Cannot create lock on '/var/run/dpdk/rte/config'. Is another primary process running?
EAL: FATAL: Cannot init config
EAL: Cannot init config
[15:01:57:246339][DOCA][ERR][NUTILS]: EAL initialization failed
Resolution :
The error occurs because the second application attempts to use /var/run/dpdk/rte/config
, which is already in use by the first application.
To resolve this issue, run the second application with the DPDK EAL option --file-prefix <name>
. This option specifies a unique file prefix for the second application to avoid conflicts.
The following example command starts the second application after starting the first application (without the --file-prefix
option):
./doca_<app_name> --file
-prefix second -a 0000:01:00.6,sft_en=1 -a 0000:01:00.7,sft_en=1 -v
-c 0xff -- -l 60
Failure to Set Huge Pages
Symptom : A permission error is raised when trying to configure huge pages from an unprivileged user account.
The error :
$ sudo echo 600 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
-bash: /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages: Permission denied
Resolution: Using sudo
with echo
behaves differently than expected. To configure huge pages correctly, use the following command instead:
$ echo '600'
| sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages