MLNX_DPDK
This page offers troubleshooting information for DPDK users and customers. It is advisable to review the following resources:
Command |
Description |
|
Part of the OFED package. This command displays all associations between network devices and Remote Direct Memory Access (RDMA) adapter ports. |
|
A Linux command that provides information about each PCI bus on your system. |
|
A Linux command used to query or control network driver and hardware settings. |
|
|
|
Builds DPDK for MLX5 |
|
Sets switchdev mode with 2 VF's. Note
The
|
Logging: For information on logging, refer to the DPDK Log Library Guide.
Trace: For details on tracing, refer to the DPDK Trace Library Guide.
Counters: For guidance on counters, refer to the Ethtool Counters Guide.
Compilation Debug Flags
Debug Flag |
Description |
|
Sets the build type as debug, which enables debugging information and disables optimizations. |
|
Activates assertion checks and enables debug messages for MLX5. |
Steering Dump Tool
This tool triggers the application to dump its specific data, as well as triggers the hardware to dump the associated hardware data For additional details, refer to mlx_steering_dump.
dpdk-proc-info Application
This application operates as a DPDK secondary process and can:
Retrieve port statistics
Reset port statistics
Print DPDK memory information
Display debug information for ports
For more information, refer to DPDK proc-info Guide.
Reproducing an Issue with testpmd Application
Use the testpmd
application to test simplified scenarios. For guidance, refer to the testpmd User Guide.
DPDK Unit Test Tool
This tool performs DPDK unit testing utilizing the testpmd
and Scapy applications. For more information, refer to DPDK Unit Test Tool.
Memory Errors
Issue: I encountered this memory error during the startup of testpmd
. What does it indicate?
EAL: No free 2048
kB hugepages reported on node 0
EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
Resolution : No hugepages have been allocated. To configure hugepages, run:
sysctl vm.nr_hugepages=<#huge_pages>
Issue: I encountered this memory error during the startup of testpmd
. What does it indicate?
mlx5_common: mlx5_common_utils.c:420
: mlx5_hlist_create(): No memory for
hash list mlx5_0_flow_groups creation
Resolution : If there are missing free, check the availability of free hugepages by running:
grep -i hugepages /proc/meminfo
Compatibility Issues Between Firmware and Driver
Issue: T he mlx5 driver is not loading. What might be the problem?
Resolution :
The issue may be due to a compatibility mismatch between the firmware and the driver. When this occurs, the driver will fail to load, and an error message will appear in the dmesg
output. To address this, verify that the firmware version matches the driver version. Refer to the NVIDIA MLNX_OFED Documentation for details on supported firmware and driver versions.
Restarting the Driver After Removing a Physical Port
Issue: I removed a physical port from an OVS-DPDK bridge while offload was enabled, and now I am encountering issues. What should I do?
Resolution : When offload is enabled, removing a physical port from an OVS-DPDK bridge requires restarting the OVS service. Failure to do so can lead to incorrect datapath rule configurations. To resolve this, restart the openvswitch service after reattaching the physical port to a bridge per your desired topology.
Limitations of the dec_ttl Feature
Issue: The
dec_ttl
feature in OVS-DPDK is not working. What could be the problem?
Resolution :
The dec_ttl
feature is only supported on ConnectX-6 adapters and is not compatible with the ConnectX-5 adapters. There is no workaround for this limitation.
Deadlock When Moving to switchdev Mode
Issue: I am experiencing a deadlock when moving to switchdev mode while deleting a namespace. How can I resolve this?
Resolution :
To avoid the deadlock, unload the mlx5_ib
module before moving to switchdev mode.
Unusable System After Unloading the mlx5_core Driver
Issue:
I am running my system from a network boot and using an NVIDIA ConnectX card to connect to network storage. When I unload the mlx5_core
driver, the system becomes unresponsive. What should I do?
Resolution :
Unloading the mlx5_core
driver (e.g., by running /etc/init.d/openibd restart
) while the system is running from a network boot and connected to network storage via an NVIDIA ConnectX card, causes system issues. To avoid this issue, it is best not to unload the mlx5_core
driver under these circumstances, as there is no available workaround.
Incompatibility Between RHEL 7.6alt and CentOS 7.6alt Kernels
Issue: I am trying to install MLNX_OFED on a system with the CentOS 7.6alt kernel, but some kernel modules built for RHEL 7.6alt do not load. How can I fix this?
Resolution : The kernel used in CentOS 7.6alt (for non-x86 architectures) differs from that in RHEL 7.6alt. As a result, MLNX_OFED kernel modules compiled for the RHEL 7.6alt kernel may not load on a CentOS 7.6alt system. To resolve this issue, you build the kernel modules specifically for the CentOS 7.6alt kernel.
Cannot Add VF 0 Representor
Issue:
I am encountering an error when trying to add VF 0 representor: mlx5_pci port query failed: Input/output error
. What should I do?
Resolution : To resolve this issue, ensure that the VF configuration is completed before starting the DPDK application.
EAL Initialization Failure
EAL initialization failure is a common error that may appear while running various DPDK-related applications.
The error appears like this:
[DOCA][ERR][NUTILS]: EAL initialization failed
There may be many causes for this error. Some of them are as follows:
The application requires huge pages and none were allocated
The application requires root privileges to run and it was run without elevated privileges
The following solutions are respective to the possible causes listed above:
Allocate huge pages. For example, run (on the host or the DPU, depending on where you are running the application):
$ echo
'2048'
| sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/hugeRun the application using sudo (or as root):
sudo <run_command>
DPDK EAL Limitation with More than 128 Cores
Issue: I am running with 190 cores, but DPDK detects only 128:
dpdk/bin/dpdk-test-compress-perf -a 0000
:11
:00.0
,class
=compress -l 0
,190
-- ...
EAL: Detected 128
lcore(s)
EAL: Detected 4
NUMA nodes
EAL: invalid core list syntax
Resolution :
To address this issue, compile DPDK with the parameter -Dmax_lcores=256
. This will enable DPDK to recognize the additional cores.
Packets Dropped When Transferring to UDP Port 4789
Issue: I am running testpmd
and created the following rules:
flow create 1
priority 1
transfer ingress group 0
pattern eth / vlan / ipv4 / udp dst is 4789
/ vxlan / end actions jump group 820
/ end
flow create 1
priority 0
transfer ingress group 820
pattern eth / vlan / ipv4 / udp dst spec 4789
dst mask 0xfff0
/ vxlan / eth / vlan / ipv4 / end actions modify_field op set dst_type udp_port_dst src_type value src_value 0x168c9e8d4df8f
width 2
/ port_id id 0
/ end
I am sending a packet that matches these rules, but the packet is dropped and not received on port 0 peer. What could be the issue?
Resolution : UDP port 4791 is reserved for RoCE (RDMA over Converged Ethernet) traffic. By default, RoCE is enabled on all mlx5 devices, and traffic to UDP port 4791 is treated as RoCE traffic. To forward traffic to this port for Ethernet use (without RDMA), you need to disable RoCE. You can do this by running:
echo <0
|1
> > /sys/devices/{pci-bus-address}/roce_enable
Refer to the MLNX_OFED Documentation for more details on disabling RoCE.
Ping Loss with 254 vDPA Devices
Issue: I am experiencing intermittent ping loss or errors with some interfaces. How can I reproduce this issue?
Reproduction Steps:
Create 127 VFs on each port.
Configure VF link aggregation (lag).
Start the vDPA example with 254 ports.
Launch 16 VMs, each with up to 16 interfaces.
Send pings from each vDPA interface to an external host.
Resolution :
Adding the runtime configuration event_core=x
resolves this issue. The event_core
parameter specifies the CPU core number for the timer thread, with the default being the EAL main lcore.
The event_core
can be shared among different mlx5 vDPA devices. However, using this core for additional tasks may impact the performance and latency of the mlx5 vDPA devices.
No TX Traffic with testpmd and HWS dv_flow=2
Issue:
I am observing that packets are not being forwarded in forward mode. I started testpmd
with the following command:
dpdk-testpmd -n 4
-a 08
:00.0
,representor=0
,dv_flow_en=2
-- -i --forward-mode=txonly -a
Resolution : To address this issue, ensure that HWS queues are configured for all the ports (representor=0-1) and the PFs. This configuration is necessary to set the default flows that will be utilized.
OVS-DPDK: Duplicate Packets When Co-running with SPDK
Issue: Duplicate packets are observed when running OVS-DPDK with offload alongside SPDK, which also attaches to the same PF. What can be done to fix this?
Resolution :
To resolve this issue, set dv_esw_en=0
on the OVS-DPDK side. This disables E-Switch using Direct Rules, which is enabled by default if supported.
Live Migration VM Stuck in PMD MAC Swap Mode
Issue: Running the following scenario leads to an endless loop:
Scenario:
Open 64 VFs.
Configure OVS-DPDK [VF].
Start a VM with 16 devices.
Initiate traffic between VMs in PMD MAC swap mode.
On Hyper-V, execute:
virsh migrate --live --unsafe --persistent --verbose qa-r-vrt-123
-009
-CentOS-8.2
qemu+ssh://qa-r-vrt-125/system tcp://22.22.22.125
Output:
Migration: [ 75
%] //repeat in a loop
Resolution :
This is a known issue. To address it, add the auto-converge
parameter for heavy vCPU/traffic loads, as live migration may not be complete otherwise. Run:
virsh migrate ... --auto-converge --auto-converge-initial 60
--auto-converge-increment 20
Failure to Create Rule with Raw Encapsulation in Switchdev Mode
Issue: I encountered an error while trying to create a rule with encapsulation on port 0:
mlx5_net: [mlx5dr_action_create_reformat_root]: Failed to create dv_create_flow reformat
mlx5_net: [mlx5dr_action_create_reformat]: Failed to create root reformat action
Template table #0
destroyed
port_flow_complain(): Caught PMD error type 1
(cause unspecified): fail to create rte table: Operation not supported
Resolution : To resolve this issue, disable encapsulation with the following command:
echo none > /sys/class
/net/<ethx>/compat/devlink/encap
If you need to offload tunnels in VFs/SFs, disable encapsulation in the FDB domain, as the NIC does not support it in both domains simultaneously.
Unable to Create Flow When Having L3 VXLAN With External Process
Issue: After detaching and reattaching ports, I encounter the following error when attempting to match on L3 VXLAN:
Detach Ports:
port stop all
device detach 0000
:08
:00.0
device detach 0000
:08
:00.1
Re-attach Ports:
mlx5 port attach 0000
:08
:00.0
socket=/var/run/external_ipc_socket
mlx5 port attach 0000
:08
:00.1
socket=/var/run/external_ipc_socket
port start all
Create a Rule:
testpmd> flow create 1
priority 2
ingress group 0
pattern eth dst is 00
:16
:3e:23
:7c:0c has_vlan spec 1
has_vlan mask 1
/ vlan vid is 1357
... / ipv6 src is ::9247
... / udp dst is 4790
/ vxlan-gpe / end actions queue index 65000
/ end
Got Error:
port_flow_complain(): Caught PMD error type 13
(specific pattern item): cause: 0x7ffd041c6be0
, L3 VXLAN is not enabled by device parameter and/or not configured in firmware: Operation not supported
Resolution : The error occurs because the l3_vxlan_en
parameter is not set when attaching the device. This parameter must be specified to enable L3 VXLAN and VXLAN-GPE flow creation.
To fix this issue:
Ensure that the
l3_vxlan_en
parameter is set to a nonzero value when attaching the device. This will allow L3 VXLAN and VXLAN-GPE flow creation.Configure the firmware to support L3 VXLAN or VXLAN-GPE, as this is a prerequisite for handling this type of traffic. By default, this parameter is disabled.
Make sure these configurations are in place to support L3 VXLAN operations.
Buffer Split - Failure to Configure max-pkt-len
Issue: When running testpmd
and trying to change the max-pkt-len
configuration, I encounter the following error:
dpdk-testpmd -n 4
-a 0000
:00
:07.0
,dv_flow_en=1
,... -a 0000
:00
:08.0
,dv_flow_en=1
,... -- --mbcache=512
-i --nb-cores=15
--txd=8192
--rxd=8192
--burst=64
--mbuf-size=177
,430
,417
--enable-scatter --tx-offloads=0x8000
--mask-event=intr_lsc
port stop all
port config all max-pkt-len 9216
port start all
Error:
Configuring Port 0
(socket 0
):
mlx5_net: port 0
too many SGEs (33
) needed to handle requested maximum packet size 9216
, the maximum supported are 32
mlx5_net: port 0
unable to allocate queue index 0
Fail to configure port 0
rx queues
Resolution : To resolve this issue, start testpmd
with rx-offloads=0x102000
to enable buffer split. Then, configure the receive packet size by using the set rxpkts (x[,y]*)
command, where x[,y]*
is a CSV list of values, and zero value indicates using the memory pool data buffer size. For example, use:
set rxpkts 49
,430
,417
Unable to Start testpmd with Shared Library
Issue: When running testpmd
from the download folder, I encounter the following error:
/tmp/dpdk/build-meson/app/dpdk-testpmd -n 4
-w ...
Error:
EAL: Error, directory path /tmp is world-writable and insecure EAL: FATAL: Cannot init plugins EAL: Cannot init plugins EAL: Error - exiting with code: 1
Cause: Cannot init EAL: Invalid argument
Resolution : The issue is due to improper permissions for the directory where DPDK is located. Rebuilding DPDK in a directory with stricter permissions should resolve this problem.
Cross-Port Action: Failure to Create Actions Template
Issue: While trying to configure two ports where port 1 is the host port for port 0, I encounter an error when creating the action template on port 0.
Testpmd reproduction:
/download/dpdk/install/bin/dpdk-testpmd -n 4
-a 0000
:08
:00.0
,dv_flow_en=2
,dv_xmeta_en=0
-a 0000
:08
:00.1
,dv_flow_en=2
,dv_xmeta_en=0
--iova-mode="va"
-- --mbcache=512
-i --nb-cores=7
--rxq=8
--txq=8
--txd=2048
--rxd=2048
port stop all
flow configure 0
queues_number 16
queues_size 256
counters_number 0
host_port 1
flags 2
flow configure 1
queues_number 16
queues_size 256
counters_number 8799
port start all
Error:
flow queue 1
indirect_action 5
create postpone false
action_id 3
ingress action count / end
flow push 1
queue 5
flow pull 1
queue 5
flow actions_template 0
create actions_template_id 8
template shared_indirect 1
3
/ end mask count / end
Actions template #8
destroyed
port_flow_complain(): Caught PMD error type 16
(specific action): cause: 0x7ffd5610a210
, counters pool not initialized: Invalid argument
Resolution : C onfigure the ports in the correct order, with the host port configured first. Update the configuration as follows:
port stop all
flow configure 1
queues_number 16
queues_size 256
counters_number 8799
flow configure 0
queues_number 16
queues_size 256
counters_number 0
host_port 1
flags 2
port start all
RSS Hashing Issue with GRE Traffic
Issue: GRE traffic is not being RSS hashed to multiple cores when received on a CX6 interface, although it works correctly with the same traffic on an Intel i40e interface.
Resolution : To enable RSS hashing for GRE traffic over inner headers, you need to create a flow rule that specifically matches tunneled packets. For example:
Pattern: ETH / IPV4 / GRE / END
Actions: RSS(level=2, types=…) / END
If you attempt to use RSS level 2 in a flow rule without including a tunnel header item, you will encounter an error: “inner RSS is not supported for non-tunnel flows.”
Unable to Probe SF/VF Device with crypto-perf-test App on Arm
Issue: I encounter this error when attempting to probe an SF on startup on an Arm host, using the following command:
dpdk-test-crypto-perf -c 0x7ff
-a auxiliary:mlx5_core.sf.1
,class
=crypto,algo=1
-- --ptest verify --aead-op decrypt --optype aead --aead-algo aes-gcm ....
Error:
No crypto devices type mlx5_pci available
USER1: Failed to initialise requested crypto device type
Additionally, when attempting to use the representor
keyword to probe SF or VF, I receive:
mlx5_common: Key "representor"
is unknown for
the provided classes.
Resolution
: On Arm platforms, only VF/SF representors are available, not the actual PCI devices. As a result, probing with the representor
keyword or using real PCI device types is not supported on Arm.
Failed to Create ESP Matcher on VFs in Template Mode
Issue:
When attempting to create a pattern template on a VF using testpmd
, I encountered the following error:
dpdk-testpmd -n 4
-a 0000
:08
:00.2
,...,dv_flow_en=2
,dv_xmeta_en=0
-a 0000
:08
:00.4
,...,dv_flow_en=2
,... -- --mbcache=512
-i --nb-cores=7
...
port stop all
flow configure 0
queues_number 7
queues_size 256
meters_number 0
counters_number 0
quotas_number 256
flow configure 1
queues_number 14
queues_size 256
meters_number 0
counters_number 7031
quotas_number 64
port start all
flow pattern_template 0
create ingress pattern_template_id 0
relaxed no template eth / ipv4 / esp / end
flow actions_template 0
create actions_template_id 0
template jump group 88
/ end mask jump group 88
/ end
flow template_table 0
create table_id 0
group 72
priority 0
ingress rules_number 64
pattern_template 0
actions_template 0
mlx5_net: [mlx5dr_definer_conv_items_to_hl]: Failed processing item type: 23
mlx5_net: [mlx5dr_definer_calc_layout]: Failed to convert items to header layout
mlx5_net: [mlx5dr_definer_matcher_init]: Failed to calculate matcher definer layout
mlx5_net: [mlx5dr_matcher_bind_mt]: Failed to set matcher templates with match definers
mlx5_net: [mlx5dr_matcher_create]: Failed to initialise matcher: 95
Resolution : Currently, IPsec offload can only be supported on a single path—either on PF, VF, or E-Switch. DPDK does not yet support IPsec on VFs; it is limited to PFs and E-Switch configurations. Consequently, configuring IPsec on VFs is not supported at this time.
Failure to Receive Hairpin Traffic (HWS) Between Two Physical Ports
Issue: I am running testpmd
in hairpin mode with the following setup: Port 0 is configured to create simple rules, and I am sending traffic on Port 0 while measuring the traffic on Port 1. However, I am not seeing any traffic on Port 1.
Setup:
Two BlueField-2 devices (DUT and TG) connected back-to-back via symmetrical interfaces: p0 and p1.
Firmware is configured for HWS.
Steps:
Run DUT in Hairpin Mode:
dpdk-testpmd -c
0xff
-n4
-a0000
:03
:00.0
,dv_flow_en=2
-a0000
:03
:00.1
,dv_flow_en=2
... --forward-mode=rxonly -i --hairpinq1
--hairpin-mode=0x12
Configure DUT:
port stop all flow configure
0
queues_number4
queues_size64
port start all flow pattern_template0
create pattern_template_id0
ingress template eth / end flow actions_template0
create actions_template_id0
template jump group1
/ end mask jump group1
/ end flow template_table0
create table_id0
group0
priority0
ingress rules_number64
pattern_template0
actions_template0
flow queue0
create0
template_table0
pattern_template0
actions_template0
postpone no pattern eth / end actions jump group1
/ end flow pull0
queue0
flow pattern_template0
create pattern_template_id2
ingress template eth / end flow actions_template0
create actions_template_id2
template queue / end mask queue / end flow template_table0
create table_id2
group1
priority0
ingress rules_number64
pattern_template2
actions_template2
flow queue0
create0
template_table2
pattern_template0
actions_template0
postpone no pattern eth / end actions queue index4
/ end flow pull0
queue0
startSend Traffic from TG on Port 0:
/tmp/dpdk-hws/app/dpdk-testpmd -c
0xff
-n4
-a0000
:03
:00.0
,dv_flow_en=1
--socket-mem=2048
-- --port-numa-config=0
--socket-num=0
--burst=64
--txd=1024
--rxd=1024
--mbcache=512
--rxq=4
--txq=4
--nb-cores=1
--forward-mode=txonly --no-lsc-interrupt -iMeasure Traffic Rate on TG Interface p1:
mlnx_perf -i p1
Expected Result: 2.5 Gbps
Actual Result: None
Resolution : Ensure that both ports are configured with HWS queues:
port stop all
flow configure 0
queues_number 4
queues_size 64
flow configure 1
queues_number 4
queues_size 64
port start all
Without proper HWS configuration on Port 1, default SQ miss rules are not inserted, which likely causes the absence of traffic on Port 1.
DPDK-OVS Memory Allocation Error
Issue: I encountered this error during initialization:
dpdk|ERR|EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
Resolution : In recent updates to OVS and DPDK:
The EAL argument
--socket-mem
is no longer configured by default on start-up. Ifdpdk-socket-mem
anddpdk-alloc-mem
are not explicitly specified, DPDK will revert to its default settings.The EAL argument
--socket-limit
no longer defaults to the value of--socket-mem
. To maintain the previous memory-limiting behavior, you should setother_config:dpdk-socket-limit
to the same value asother_config:dpdk-socket-mem
.
Ensure that you provide the dpdk-socket-limit
parameter, which can be set to match the dpdk-socket-mem
value to avoid such errors.
High Latency in VDPA Ping Traffic Between VMs with 240 SFs
Issue: Ping traffic between VMs exhibits high latency when configured with 240 SFs.
Resolution :
To achieve better latency results in this scenario, configure the system with event-mode=2
.
DPDK's event-mode=2
, also known as Event Forward mode, allows DPDK applications to offload DMA operations to a DMA adapter while maintaining the correct order of ingress packets.
testpmd Startup Failure in ConnectX-6 Dx KVM Setup
Issue: When running testpmd with the parameter tx_pp=500
, the application exits with the following error:
dpdk-testpmd -n 4
-w 0000
:00
:07.0
,l3_vxlan_en=1
,tx_pp=500
,dv_flow_en=1
-w ...
Result:
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000
:00
:07.0
(socket 0
)
mlx5_net: WQE rate mode is required for
packet pacing
mlx5_net: probe of PCI device 0000
:00
:07.0
aborted after encountering an error: No such device
Resolution :
Ensure that the REAL_TIME_CLOCK_ENABLE
parameter in mlxconfig is set to 1.
The REAL_TIME_CLOCK_ENABLE
parameter activates the real-time timestamp format on Mellanox ConnectX network adapters, which provides timestamps relative to the Unix epoch and is required for packet pacing functionality.
Please make sure the mlxconfig param for REAL_TIME_CLOCK_ENABLE is set to 1.
TCP Hardware Hairpin Connection Forwarding Packets to Host
Issue:
Some TCP packets, after passing through the Connection Tracking (CT) check, do not have either the RTE_FLOW_CONNTRACK_PKT_STATE_VALID
or RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED
flags set. They also lack the RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED
flag, yet I observe that these packets are being forwarded to the host. How can I determine why these packets are being forwarded to the host or why the CT check is failing? Note that all CT objects are created with liberal mode set to 1.
Resolution : To address this issue, adjust the configuration of the conntrack object in one of the following ways:
Set
max_win
to 0 for both the original and reply directions.Configure
max_win
with the appropriate values for both directions, as recommended in the release notes and header files.
OVS-DPDK LAG - Configuration Mismatch for "dv_xmeta_en"
Issue: When configuring OVS with vDPA ports, the following setup was used:
dpdk-extra="-w 0000:86:00.0,representor=pf0vf[0-15],dv_xmeta_en=1,dv_flow_en=1,dv_esw_en=1 -w 0000:86:00.0,representor=pf1vf[0-15],dv_xmeta_en=1,dv_flow_en=1,dv_esw_en=1"
dpdk-init="true"
dpdk-socket-mem="8192,8192"
hw-offload="true"
When adding vDPA ports to OVS from port1 and port2, the following error was encountered in the OVS log:
Jul 04
12
:24
:24
qa-r-vrt-123
ovs-vsctl[23091
]: ovs|00001
|vsctl|INFO|Called as ovs-vsctl add-port br0-ovs vdpa16 -- set Interface vdpa16 type=dpdkvdpa options:vdpa-socket-path=/tmp/sock16 options:vdpa-accelerator-devargs=0000
:86
:08.2
options:dpdk-devargs=0000
:86
:00.0
,representor=pf1vf[0
]
Jul 04
12
:24
:24
qa-r-vrt-123
ovs-vswitchd[22545
]: ovs|00570
|dpdk|ERR|mlx5_net: "dv_xmeta_en"
configuration mismatch for
shared mlx5_bond_0 context
The issue does not occur if dv_xmeta_en=1
is removed during initialization.
Resolution :
The problem arises because DPDK does not probe the same PCI address with different devargs
settings. Consequently, the representors of PF1 are ignored, leading OVS to treat them as new ports and probe them accordingly.
To resolve this, configure both PF representors using a single devargs
entry, like so: representor=pf[0-1]vf[0-15]
. This way, both PF representors will be correctly identified and probed without causing a configuration mismatch.
Issue Binding Hairpin Queues After Enabling Port Loopback
Issue:
When running testpmd
in hairpin mode, the system fails to bind hairpin queues after configuring the port for loopback mode. The configuration steps were as follows:
testpmd> port stop 1
testpmd> port config 1
loopback 1
testpmd> port start 1
Resolution : To resolve this issue, ensure that all ports are stopped before applying the loopback configuration. Follow these steps:
port stop all
port config 1
loopback 1
port start all
Memory Allocation Failure
Issue: When running testpmd
with iova-mode=pa
, initialization fails with the following errors:
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576
kB hugepages reported
Fail to start port 0
: Cannot allocate memory
Fail to start port 1
: Cannot allocate memory
Resolution : The memory fragmentation issue can be resolved by using iova-mode=va
instead. This mode utilizes virtual addressing, which can handle memory fragmentation more effectively.
VDPA Rx Packet Truncation with --enable-scatter
Issue:
Packets are being truncated when configuring the maximum packet length with the following testpmd
settings:
dpdk-testpmd -n 4
-w 0000
:04
:00.0
,representor=[0
,1
],dv_xmeta_en=0
,txq_inline=290
,rx_vec_en=1
,l3_vxlan_en=1
,dv_flow_en=1
,dv_esw_en=1
-- .. --enable-scatter ..
After changing the MTU and maximum packet length:
port stop all
port config all max-pkt-len 8192
port start all
port config mtu 0
8192
port config mtu 1
8192
port config mtu 2
8192
Resolution : T he issue is that packets are truncated to the default size of 1518 bytes. This problem typically stems from the guest VM's MTU settings rather than the host configuration.
To address this, update the MTU settings for the virtio-net device in the guest VM:
Set the MTU value using the
host_mtu
parameter when launching the VM:-device virtio-net-pci,netdev=netdev0,mac=
52
:54
:00
:00
:00
:01
,mrg_rxbuf=on,host_mtu=9000
This sets the MTU for the virtio-net device to 9000 bytes.
Inside the guest VM, verify the MTU value with the
ifconfig
command to ensure it matches the specified value (9000 in this case).If a different MTU value is required, adjust both the
--max-pkt-len
parameter in thetestpmd
command on the host and thehost_mtu
parameter in the QEMU command for the guest accordingly.
Ring Memory Issue
Issue : The following memory error is encountered:
RING: Cannot reserve memory
[13
:00
:57
:290147
][DOCA][ERR][UFLTR::Core:156
]: DPI init failed
Resolution : This is a common memory issue when running application on the host.
The most common cause for this error is lack of memory (i.e., not enough huge pages per worker thread).
Possible solutions:
Recommended: Increase the amount of allocated huge pages. Instructions for allocating huge pages can be found here.
Alternatively, one can also limit the number of cores used by the application:
-c <core-mask>
– Set the hexadecimal bitmask of the cores to run on.-l <core-list>
– list of cores to run on.
For example:
./doca_<app_name> -a 3b:00.3 -a 3b:00.4 -l 0-64 -- -l 60
DOCA Apps Using DPDK in Parallel Issue
Issue : When running two DOCA apps in parallel that use DPDK, the first app runs but the second one fails.
The following error is received:
Failed to start URL Filter with output: EAL: Detected 16
lcore(s)
EAL: Detected 1
NUMA nodes
EAL: RTE Version: 'MLNX_DPDK 20.11.4.0.3'
EAL: Detected shared linkage of DPDK
EAL: Cannot create lock on '/var/run/dpdk/rte/config'
. Is another primary process running?
EAL: FATAL: Cannot init config
EAL: Cannot init config
[15
:01
:57
:246339
][DOCA][ERR][NUTILS]: EAL initialization failed
Resolution :
The cause of the error is that the second application is using /var/run/dpdk/rte/config
when the first application is already using it.
To run two applications in parallel, the second application must be run with DPDK EAL option --file-prefix <name>
.
In this example, after running the first application (without adding the eal
option), to run the second with the EAL option. Run:
./doca_<app_name> --file-prefix second -a 0000
:01
:00.6
,sft_en=1
-a 0000
:01
:00.7
,sft_en=1
-v -c 0xff
-- -l 60
Failure to Set Huge Pages
Issue : When trying to configure the huge pages from an unprivileged user account, a permission error is raised.
Configuring the huge pages results in the following error:
$ sudo echo 600 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
-bash: /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages: Permission denied
Resolution:
Using sudo
with echo
works differently than users usually expect. The command should be as follows:
$ echo '600'
| sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages