OVS-DOCA
This page provides troubleshooting guidance for common issues encountered with OVS-DOCA on NVIDIA BlueField DPUs, including rule offload verification, performance bottlenecks, crash diagnostics, and advanced metrics.
NVIDIA OVS is based on upstream Open vSwitch (version 2.17.8) and supports all standard upstream commands and tools.
Recommended resources:
Command | Explanation |
| Checks the status of the Open vSwitch service on RPM-based OSs. |
| Restarts the Open vSwitch service on RPM-based OSs. |
| Checks the status of the Open vSwitch service on Debian-based OSs. |
| Restarts the Open vSwitch service on Debian-based OSs. |
| Prints a brief overview of the Open vSwitch database contents. |
| Lists global information and settings of OVS, including DOCA/DPDK and OVS versions, and whether DOCA/DPDK mode is active. |
| Prints a list of all interfaces with detailed information. |
| Prints a list of all bridges with detailed information. |
| Prints Linux driver and firmware errors. |
| Prints all OpenFlow entries in the bridge's tables. |
| Prints all active data path flows with counters and offload indications. |
| Prints all connections, including 5-tuple info, state, and offload status. |
| Prints DOCA/DPDK PMD (software) counters. |
| Resets DOCA/DPDK PMD (software) counters. |
| Sets the data path flows aging time to the specified milliseconds. |
| Prints DOCA/DPDK offload counters, including the number of offloaded data path flows and connections. |
| Disables hardware offload (requires a service restart). |
| Live monitor of software and hardware counters. |
| Obtains additional device statistics. |
| Dumps non-offloaded traffic from all representors controlled by the specified IB device. |
| Prints current Open vSwitch logging levels. |
| Controls Open vSwitch logging levels. Recommended settings for debugging DOCA/DPDK: |
| Enables slowpath packet tracing in the Open vSwitch log. |
| Dumps DOCA pipe groups, showing the created chains of masks for each group. Note: A pipe group is a chain of DOCA pipes, where a miss on one pipe leads to another, each with different masks and actions. Special predefined groups exist. |
| Enables DPDK offload tracing (requires an OVS service restart). This setting will dump DPDK offloads directly for debugging purposes. |
| If DPDK offload tracing is enabled, this command dumps DPDK offloads in DPDK RTE flow format, including dpctl flows and connection tracking offloads. |
Log Files
OVS logs are available on the BlueField Arm side at:
/var/log/openvswitch/ovs-vswitchd.log
Log levels can be independently configured for console, syslog, and file outputs. By default:
Console –
OFF
Syslog –
ERR
File –
INFO
To view current log levels:
ovs-appctl vlog/list
To set logging for DOCA-related modules:
ovs-appctl vlog/set ovs_doca:file:DBG dpdk_offload_doca:file:DBG dpif_netdev:file:DBG netdev_offload_dpdk:file:DBG
Logging settings revert to default after the OVS service restarts.
OpenFlow Table Dump
Dump the current OpenFlow rules:
# ovs-ofctl dump-flows <bridge>
Each rule shows:
Packet and byte match counters
Rule duration
Example:
# ovs-ofctl dump-flows br-int
cookie=0x0
, duration=65
.630s, table=0
, n_packets=4
, n_bytes=234
, arp actions=NORMAL
cookie=0x0
, duration=65
.622s, table=0
, n_packets=20
, n_bytes=1960
, icmp actions=NORMAL
cookie=0x0
, duration=65
.605s, table=0
, n_packets=0
, n_bytes=0
, ct_state=-trk,ip actions=ct(table=1
,zone=5
)
cookie=0x0
, duration=65
.562s, table=1
, n_packets=0
, n_bytes=0
, ct_state=+new
+trk,ip actions=ct(commit,zone=5
),NORMAL
cookie=0x0
, duration=65
.554s, table=1
, n_packets=0
, n_bytes=0
, ct_state=+est+trk,ct_zone=5
,ip actions=NORMAL
DataPath Flow Dump
To view datapath flows:
# ovs-appctl dpctl/dump-flows -m
Each flow includes:
Match criteria
Applied actions
Offload status (e.g.,
dp:doca
,offloaded: yes
)Packet and byte counts
Flow usage time
Datapath types: OVS, DOCA, DPDK, TC.
Example:
# ovs - appctl dpctl / dump - flows - m
flow - dump from pmd on cpu core : 21
ufid : c79d3e57 - 10eb - 427f - a5d3 -
2785f0cbbac1,
skb_priority(0
/ 0
),
tunnel(tun_id = 0x2a
, src = 7.7
.7.8
, dst = 7.7
.7.7
, ttl = 64
/ 0
,
eth_src = 10
: 70
: fd:d9 : 0d
: a4 / 00
: 00
: 00
: 00
: 00
: 00
, eth_dst = 10
: 70
: fd:d9 : 0d
: c8 / 00
: 00
: 00
: 00
: 00
: 00
, type = gre / none,
flags(-df + key)),
skb_mark(0
/ 0
), ct_state(0
/ 0
), ct_zone(0
/ 0
), ct_mark(0
/ 0
),
ct_label(0
/ 0
), recirc_id(0
), dp_hash(0
/ 0
), in_port(gre_sys),
packet_type(ns = 0
, id = 0
),
eth(src = c2 : 32
: df : 66
: 71
: af, dst = e4 : 73
: 41
: 08
: 00
: 02
),
eth_type(0x0800
),
ipv4(src = 1.1
.1.8
/ 0.0
.0.0
, dst = 1.1
.1.7
/ 0.0
.0.0
, proto = 1
,
tos = 0
/ 0
, ttl = 64
/ 0
, frag = no),
icmp(type = 0
/ 0
, code = 0
/ 0
), packets : 1
, bytes : 144
, used : 1
.488s,
offloaded : yes,
dp : doca,
actions : pf0vf0,
dp - extra - info : miniflow_bits(9
, 1
)
PMD Counters
You can dump the OVS software processing counters using this command:
# ovs-appctl dpif-netdev/pmd-stats-show
The "packets received" counter shows the number of packets processed by software and can be used to monitor issues related to hardware offloads.
Example:
# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0
core_id 21
:
packets received: 75
packet recirculations: 14
avg. datapath passes per packet: 1.19
phwol hits: 5
mfex opt hits: 0
simple match hits: 0
emc hits: 0
smc hits: 0
megaflow hits: 28
avg. subtable lookups per megaflow hit: 1.82
miss with success upcall: 56
miss with failed upcall: 0
avg. packets per output batch: 1.02
idle cycles: 7405350461306
(100.00
%)
processing cycles: 16284620
(0.00
%)
avg cycles per packet: 98738223279.01
(7405366745926
/75
)
avg processing cycles per packet: 217128.27
(16284620
/75
)
To reset these statistics, use this command:
# ovs-appctl dpif-netdev/pmd-stats-clear
Offload Counters
To check offload activity:
# ovs-appctl dpctl/offload-stats-show
Counters include:
Enqueued offloads – Pending rules in hardware
Inserted offloads – Active rules in hardware
CT uni-dir / bi-dir Connections – Active connection tracking entries
Example:
# ovs-appctl dpctl/offload-stats-show
HW Offload stats:
Total Enqueued offloads: 0
Total Inserted offloads: 42
Total CT uni-dir Connections: 0
Total CT bi-dir Connections: 1
Total Cumulative Average latency (us): 102761
Total Cumulative Latency stddev (us): 131560
Total Exponential Average latency (us): 125942
Total Exponential Latency stddev (us): 132435
Metrics
The ovs-metrics
tool provides live hardware and software counters.
# ovs-metrics
The ovs-metrics
script requires the python3-doca-openvswitch
package. To install it:
On Ubuntu:
sudo apt install python3-doca-openvswitch
On RHEL:
sudo yum install python3-doca-openvswitch
The ovs-metrics
tool dumps the following information every second:
sw-pkts
– number of packets passed in software (total)sw-pps
– last second packet per second in softwaresw-conns
– number of CT connections in softwaresw-cps
– last second new connections per second in softwarehw-pkts
– number of packets passed in hardware (total)hw-pps
– last second packet per second in hardwarehw-conns
– number of CT connections in hardwarehw-cps
– last second new connections per second in hardwareenqueued
– number of rules pending hardware offloadhw-rules
– number of offloaded rules in hardware (including infrastructure rules)hw-rps
– last second new hardware rules offloaded per second
DOCA Group Pipe Dump
To view DOCA pipe groups:
# ovs-appctl doca-pipe-group/dump
Each group shows:
Group ID (e.g.,
post-ct
,post-meter
)Pipe structure and priority
Match conditions and forwarding type
Example:
esw_mgr_port_id = 0
,
group_id = 0x00000000
esw = 0x7fd9be8fe048
, group_id = 0x00000000
,
priority = 2
, fwd.type = port,
match.parser_meta.port_meta[4
, changeable] = 0xffffffff
/ 0xffffffff
,
match.parser_meta.outer_ip_fragmented[1
, changeable] = 0xff
/ 0xff
,
match.outer.eth.type[2
, changeable] = 0xffff
/ 0xffff
,
match.outer.l3_type[4
, specific] = 0x02000000
/ 0x02000000
,
empty_actions_mask esw = 0x7fd9be8fe048
, group_id = 0x00000000
,
priority = 4
, fwd.type = pipe, empty_match,
empty_actions_mask esw_mgr_port_id = 0
,
group_id = 0xfd000000
(post - ct) esw = 0x7fd9be8fe048
,
group_id = 0xfd000000
(post - ct), priority = 4
, fwd.type = pipe,
empty_match, empty_actions_mask esw_mgr_port_id = 0
,
group_id = 0xff000000
(post - meter) esw = 0x7fd9be8fe048
,
group_id = 0xff000000
(post - meter), priority = 4
, fwd.type = pipe,
empty_match, empty_actions_mask esw_mgr_port_id = 0
,
group_id = 0xf2000000
(sample - post - mirror) esw = 0x7fd9be8fe048
,
group_id = 0xf2000000
(sample - post - mirror), priority = 1
,
fwd.type = drop, match.outer.eth.type[2
, changeable] = 0xffff
/ 0x8809
,
empty_actions_mask esw = 0x7fd9be8fe048
,
group_id = 0xf2000000
(sample - post - mirror), priority = 3
,
fwd.type = pipe, empty_match,
actions.meta.pkt_meta[4
, changeable] =
0xffffffff
/ 0x00f0ffff
esw_mgr_port_id = 0
,
group_id = 0xf1000000
(sample)esw_mgr_port_id = 0
,
group_id = 0xf3000000
(miss)esw_mgr_port_id = 0
,
group_id = 0xfb000000
(post - hash) esw = 0x7fd9be8fe048
,
group_id = 0xfb000000
(post - hash), priority = 4
, fwd.type = pipe,
empty_match, empty_actions_mask
This command displays the created groups, where each group is identified by a group ID and includes a list of DOCA flow pipes arranged in a chain (with misses leading from one pipe to the next) and sorted by priority. These pipe groups are shown in the order they were created. Special group IDs are labeled (e.g., post-hash). The dump also includes the forwarding type for each pipe and any header rewrite actions, if applicable.
To improve troubleshooting of OVS crashes, ensure that coredumpctl
is installed and properly configured on your system. This utility automates the collection of core dumps, which can be analyzed using gdb
to extract backtraces and other relevant diagnostic information.
Core dumps are especially useful for investigating rare or hard-to-reproduce crashes, as they provide a complete snapshot of the process state at the time of failure—greatly aiding root cause analysis.
On Ubuntu systems, install it with:
apt install systemd-coredump
Make sure your system is configured to collect and retain core dumps. You may need to update /etc/systemd/coredump.conf
and verify that settings such as ulimit
and kernel parameters permit core dump generation.
OVS is compiled with libunwind
support, so if a crash occurs, a backtrace may also be logged directly in the relevant log file. For example, if ovs-vswitchd
crashes, a backtrace should appear in /var/log/openvswitch/ovs-vswitchd.log
.
To allow core dumps:
# ulimit -c unlimited
# sysctl -w fs.suid_dumpable=1
To analyze the core dump with symbols, install the debug info package:
RPM-based distributions:
# dnf install openvswitch-debuginfo
Debian-based distributions:
# apt install openvswitch-dbg
Failure to Start OVS
If OVS fails to start after enabling DOCA mode, it is often due to missing hugepage configuration.
Check the OVS log file at /var/log/openvswitch/ovs-vswitchd.log
for additional details.
If hugepages are not configured, you may encounter the following error:
2024
-03
-13T14:59
:26
.806Z|00025
|dpdk|ERR|EAL: Cannot get hugepage information.
Failure to Add Port to Bridge
Port addition failures can result from several common misconfigurations.
DOCA/DPDK Not Initialized
You may see:
error:
"could not open network device pf0 (Address family not supported by protocol)"
Resolution:
ovs-vsctl set o . other_config:doca-init=
true
systemctl restart openvswitch-switch
eSwitch Manager (PF) Not Added
Error:
error:
"could not add network device pf0vf0 to ofproto (Resource temporarily unavailable)"
Resolution: Add the PF (eSwitch manager) port to the OVS bridge before adding its associated VFs.
Missing datapath_type=netdev for DOCA/DPDK ports
Error:
error:
"could not add network device eth2 to ofproto (Invalid argument)"
Explanation: When using DOCA/DPDK ports, the bridge must have
datapath_type
set tonetdev
.Verify using:
ovs-vsctl get bridge <BR> datapath_type
Non-existent Port Specified
Error:
error:
"rep1: could not set configuration (No such device)"
Resolution: Verify that the specified device exists and is visible to the system.
Traffic Failure
Failure to pass traffic between interfaces may be caused by the following issues:
Port not added successfully – Refer to Failure to Add Port to Bridge to ensure ports were added correctly.
Incorrect VF subnet configuration – If traffic is sent between VFs on different subnets, it will not be forwarded unless explicit OpenFlow rules are configured to permit inter-subnet routing.
Conflicting kernel routing table – Verify that the kernel's routing table does not contain overlapping routes. Each unique IP address should be associated with only one interface.
Missing VF representors on the OVS bridge – If a VF's representor is not attached to the bridge, traffic from that VF will not reach the OVS pipeline.
Tunnel misconfiguration:
Missing neighbor discovery between tunnel endpoints – For tunnel traffic to work, L3 connectivity between endpoints must be established.
Ensure the OVS bridge has the correct local tunnel IP.
Ensure the remote system has an interface configured with the corresponding remote tunnel IP.
Mismatched VNI configuration – Both systems must use the same VNI (VXLAN Network Identifier) for traffic to be correctly encapsulated and decapsulated.
Performance Degradation (No Offload)
If you experience performance degradation, it may indicate that OVS is not offloading flows to hardware as expected.
Verify Offload Status
Verify offload status. Run:
# ovs-vsctl get Open_vSwitch . other_config:hw-offload
If
hw-offload = true
– Fast Path is enabled (offload is working)If
hw-offload = false
– Slow Path is used (offload is disabled)
Enable Hardware Offload
For RHEL/CentOS, run:
# ovs-vsctl set Open_vSwitch . other_config:hw-offload=
true
# systemctl restart openvswitch # systemctl enable openvswitchFor Ubuntu/Debian:
# ovs-vsctl set Open_vSwitch . other_config:hw-offload=
true
# systemctl restart openvswitch-switch
Check Offload Status of Rules
To verify which flows are offloaded:
# ovs-appctl dpctl/dump-flows -m
If
dp:ovs
appears in the output, the flow was handled in software (offload failed).Review the end of each flow entry or check the OVS logs to identify the reason for failure.
PMD (Poll Mode Driver) counters can also confirm if packets are being processed in software.
Consider ct-zone and mem-zone Usage
Performance issues may also arise due to resource exhaustion from connection tracking or memory zone limits.
OVS supports up to 65,535 ct-zones.
In DOCA basic pipe mode, each ct-zone may consume approximately 36 mem-zones.
If too many ct-zones are created, the system may run out of available mem-zones, which can impact offload and degrade performance.
Reaching Maximum Number of Memory Zones
Due to the increased mem-zone requirement per connection tracking (ct) zone, users may reach the maximum number of DPDK mem-zones more easily—especially when configuring a large number of ct-zones. By default, the mem-zone limit is set to 2560.
Error in Logs
When the mem-zone limit is reached, the following error will appear in the logs:
2024
-07
-30T19:17
:07
.585Z|00002
|dpdk(hw_offload4)|ERR|EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested memzone segments exceeds max memzone segments (2560
>= 2560
)
Workaround
To resolve this issue, increase the number of mem-zones by setting the dpdk-max-memzones
configuration parameter:
ovs-vsctl set o . other_config:dpdk-max-memzones=<desired_number>
Replace <desired_number>
with the total number of mem-zones required for your configuration.
Example Scenario
You are configuring 500 ct-zones. Since each ct-zone requires approximately 36 mem-zones, you will need a total of:
500
ct-zones × 36
mem-zones/ct-zone = 18
,000
mem-zones
It is recommended to reserve additional mem-zones for other pipeline components. For example, you can preserve the default 2560 mem-zones for general system use.
Total required mem-zones:
18
,000
(for
ct-zones) + 2
,560
(reserved) = 20
,560
Set the value:
ovs-vsctl set o . other_config:dpdk-max-memzones=20560
By adjusting the mem-zone limit accordingly, you can avoid allocation failures and performance degradation caused by resource exhaustion—especially in environments with large-scale connection tracking configurations.