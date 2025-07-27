NVIDIA BlueField Platform Software Troubleshooting Guide
OVS-DOCA

Preface

This page provides troubleshooting guidance for common issues encountered with OVS-DOCA on NVIDIA BlueField DPUs, including rule offload verification, performance bottlenecks, crash diagnostics, and advanced metrics.

NVIDIA OVS is based on upstream Open vSwitch (version 2.17.8) and supports all standard upstream commands and tools.

Recommended resources:

Command Cheat Sheet

Command

Explanation

systemctl status openvswitch

Checks the status of the Open vSwitch service on RPM-based OSs.

systemctl restart openvswitch

Restarts the Open vSwitch service on RPM-based OSs.

systemctl status openvswitch-switch

Checks the status of the Open vSwitch service on Debian-based OSs.

systemctl restart openvswitch-switch

Restarts the Open vSwitch service on Debian-based OSs.

ovs-vsctl show

Prints a brief overview of the Open vSwitch database contents.

ovs-vsctl list open_vswitch

Lists global information and settings of OVS, including DOCA/DPDK and OVS versions, and whether DOCA/DPDK mode is active.

ovs-vsctl list interface

Prints a list of all interfaces with detailed information.

ovs-vsctl list bridge

Prints a list of all bridges with detailed information.

dmesg

Prints Linux driver and firmware errors.

ovs-ofctl dump-flows <bridge>

Prints all OpenFlow entries in the bridge's tables.

ovs-appctl dpctl/dump-flows -m

Prints all active data path flows with counters and offload indications.

ovs-appctl dpctl/dump-conntrack -m

Prints all connections, including 5-tuple info, state, and offload status.

ovs-appctl dpif-netdev/pmd-stats-show

Prints DOCA/DPDK PMD (software) counters.

ovs-appctl dpif-netdev/pmd-stats-clear

Resets DOCA/DPDK PMD (software) counters.

ovs-vsctl set Open_vSwitch . other_config:max-idle=<msec>

Sets the data path flows aging time to the specified milliseconds.

ovs-appctl dpctl/offload-stats-show

Prints DOCA/DPDK offload counters, including the number of offloaded data path flows and connections.

ovs-vsctl set Open_vSwitch . other_config:hw-offload=false

Disables hardware offload (requires a service restart).

ovs-metrics

Live monitor of software and hardware counters.

ethtool -S <PF>

Obtains additional device statistics.

tcpdump -i <ib_device>

Dumps non-offloaded traffic from all representors controlled by the specified IB device.

ovs-appctl vlog/list

Prints current Open vSwitch logging levels.

ovs-appctl vlog/set <file:destination:lvl>

Controls Open vSwitch logging levels. Recommended settings for debugging DOCA/DPDK: dpif_netdev:file:DBG, netdev_offload_dpdk:file:DBG, ovs_doca:file:DBG, dpdk_offload_doca:file:DBG. Note: Logging levels revert to default on OVS service restart.

ovs-appctl dpif-netdev/dump-packets

Enables slowpath packet tracing in the Open vSwitch log.

ovs-appctl doca-pipe-group/dump

Dumps DOCA pipe groups, showing the created chains of masks for each group. Note: A pipe group is a chain of DOCA pipes, where a miss on one pipe leads to another, each with different masks and actions. Special predefined groups exist.

ovs-vsctl set Open_vSwitch . other_config:dpdk-offload-trace=true

Enables DPDK offload tracing (requires an OVS service restart). This setting will dump DPDK offloads directly for debugging purposes.

ovs-appctl dpdk/dump-offloads

If DPDK offload tracing is enabled, this command dumps DPDK offloads in DPDK RTE flow format, including dpctl flows and connection tracking offloads.

Logging and Counters

Log Files

OVS logs are available on the BlueField Arm side at:

Copy
Copied!
            

            
/var/log/openvswitch/ovs-vswitchd.log

Log levels can be independently configured for console, syslog, and file outputs. By default:

  • Console – OFF

  • Syslog – ERR

  • File – INFO

To view current log levels:

Copy
Copied!
            

            
ovs-appctl vlog/list

To set logging for DOCA-related modules:

Copy
Copied!
            

            
ovs-appctl vlog/set ovs_doca:file:DBG dpdk_offload_doca:file:DBG dpif_netdev:file:DBG netdev_offload_dpdk:file:DBG

Note

Logging settings revert to default after the OVS service restarts.


OpenFlow Table Dump

Dump the current OpenFlow rules:

Copy
Copied!
            

            
# ovs-ofctl dump-flows <bridge>

Each rule shows:

  • Packet and byte match counters

  • Rule duration

Example:

Copy
Copied!
            

            
# ovs-ofctl dump-flows br-int 
 cookie=0x0, duration=65.630s, table=0, n_packets=4, n_bytes=234, arp actions=NORMAL
 cookie=0x0, duration=65.622s, table=0, n_packets=20, n_bytes=1960, icmp actions=NORMAL
 cookie=0x0, duration=65.605s, table=0, n_packets=0, n_bytes=0, ct_state=-trk,ip actions=ct(table=1,zone=5)
 cookie=0x0, duration=65.562s, table=1, n_packets=0, n_bytes=0, ct_state=+new+trk,ip actions=ct(commit,zone=5),NORMAL
 cookie=0x0, duration=65.554s, table=1, n_packets=0, n_bytes=0, ct_state=+est+trk,ct_zone=5,ip actions=NORMAL


DataPath Flow Dump

To view datapath flows:

Copy
Copied!
            

            
# ovs-appctl dpctl/dump-flows -m

Each flow includes:

  • Match criteria

  • Applied actions

  • Offload status (e.g., dp:doca, offloaded: yes)

  • Packet and byte counts

  • Flow usage time

Datapath types: OVS, DOCA, DPDK, TC.

Example:

Copy
Copied!
            

            
#  ovs - appctl dpctl / dump - flows - m
flow - dump from pmd on cpu core : 21 ufid : c79d3e57 - 10eb - 427f - a5d3 -
                                             2785f0cbbac1,
    skb_priority(0 / 0),
    tunnel(tun_id = 0x2a, src = 7.7.7.8, dst = 7.7.7.7, ttl = 64 / 0,
           eth_src = 10 : 70
           : fd:d9 : 0d
           : a4 / 00 : 00 : 00 : 00 : 00 : 00, eth_dst = 10 : 70
           : fd:d9 : 0d
           : c8 / 00 : 00 : 00 : 00 : 00 : 00, type = gre / none,
             flags(-df + key)),
    skb_mark(0 / 0), ct_state(0 / 0), ct_zone(0 / 0), ct_mark(0 / 0),
    ct_label(0 / 0), recirc_id(0), dp_hash(0 / 0), in_port(gre_sys),
    packet_type(ns = 0, id = 0),
    eth(src = c2 : 32
        : df : 66 : 71
        : af, dst = e4 : 73 : 41 : 08 : 00 : 02),
    eth_type(0x0800),
    ipv4(src = 1.1.1.8 / 0.0.0.0, dst = 1.1.1.7 / 0.0.0.0, proto = 1,
         tos = 0 / 0, ttl = 64 / 0, frag = no),
    icmp(type = 0 / 0, code = 0 / 0), packets : 1, bytes : 144, used : 1.488s,
    offloaded : yes,
                dp : doca,
                     actions : pf0vf0,
                               dp - extra - info : miniflow_bits(9, 1)


PMD Counters

You can dump the OVS software processing counters using this command:

Copy
Copied!
            

            
# ovs-appctl dpif-netdev/pmd-stats-show

The "packets received" counter shows the number of packets processed by software and can be used to monitor issues related to hardware offloads.

Example:

Copy
Copied!
            

            
# ovs-appctl dpif-netdev/pmd-stats-show 
pmd thread numa_id 0 core_id 21:
  packets received: 75
  packet recirculations: 14
  avg. datapath passes per packet: 1.19
  phwol hits: 5
  mfex opt hits: 0
  simple match hits: 0
  emc hits: 0
  smc hits: 0
  megaflow hits: 28
  avg. subtable lookups per megaflow hit: 1.82
  miss with success upcall: 56
  miss with failed upcall: 0
  avg. packets per output batch: 1.02
  idle cycles: 7405350461306 (100.00%)
  processing cycles: 16284620 (0.00%)
  avg cycles per packet: 98738223279.01 (7405366745926/75)
  avg processing cycles per packet: 217128.27 (16284620/75)

To reset these statistics, use this command:

Copy
Copied!
            

            
# ovs-appctl dpif-netdev/pmd-stats-clear


Offload Counters

To check offload activity:

Copy
Copied!
            

            
# ovs-appctl dpctl/offload-stats-show

Counters include:

  • Enqueued offloads – Pending rules in hardware

  • Inserted offloads – Active rules in hardware

  • CT uni-dir / bi-dir Connections – Active connection tracking entries

Example:

Copy
Copied!
            

            
# ovs-appctl dpctl/offload-stats-show
HW Offload stats:
     Total                 Enqueued offloads:       0
     Total                 Inserted offloads:      42
     Total            CT uni-dir Connections:       0
     Total             CT bi-dir Connections:       1
     Total   Cumulative Average latency (us):  102761
     Total    Cumulative Latency stddev (us):  131560
     Total  Exponential Average latency (us):  125942
     Total   Exponential Latency stddev (us):  132435


Metrics

The ovs-metrics tool provides live hardware and software counters.

Copy
Copied!
            

            
# ovs-metrics

Note

The ovs-metrics script requires the python3-doca-openvswitch package. To install it:

  • On Ubuntu:

    Copy
    Copied!
                
    
            
    sudo apt install python3-doca-openvswitch

  • On RHEL:

    Copy
    Copied!
                
    
            
    sudo yum install python3-doca-openvswitch

The ovs-metrics tool dumps the following information every second:

  • sw-pkts – number of packets passed in software (total)

  • sw-pps – last second packet per second in software

  • sw-conns – number of CT connections in software

  • sw-cps – last second new connections per second in software

  • hw-pkts – number of packets passed in hardware (total)

  • hw-pps – last second packet per second in hardware

  • hw-conns – number of CT connections in hardware

  • hw-cps – last second new connections per second in hardware

  • enqueued – number of rules pending hardware offload

  • hw-rules – number of offloaded rules in hardware (including infrastructure rules)

  • hw-rps – last second new hardware rules offloaded per second

DOCA Group Pipe Dump

To view DOCA pipe groups:

Copy
Copied!
            

            
# ovs-appctl doca-pipe-group/dump

Each group shows:

  • Group ID (e.g., post-ct, post-meter)

  • Pipe structure and priority

  • Match conditions and forwarding type

Example:

Copy
Copied!
            

            
esw_mgr_port_id = 0,
    group_id = 0x00000000 esw = 0x7fd9be8fe048, group_id = 0x00000000,
    priority = 2, fwd.type = port,
    match.parser_meta.port_meta[4, changeable] = 0xffffffff / 0xffffffff,
    match.parser_meta.outer_ip_fragmented[1, changeable] = 0xff / 0xff,
    match.outer.eth.type[2, changeable] = 0xffff / 0xffff,
    match.outer.l3_type[4, specific] = 0x02000000 / 0x02000000,
    empty_actions_mask esw = 0x7fd9be8fe048, group_id = 0x00000000,
    priority = 4, fwd.type = pipe, empty_match,
    empty_actions_mask esw_mgr_port_id = 0,
    group_id = 0xfd000000(post - ct) esw = 0x7fd9be8fe048,
    group_id = 0xfd000000(post - ct), priority = 4, fwd.type = pipe,
    empty_match, empty_actions_mask esw_mgr_port_id = 0,
    group_id = 0xff000000(post - meter) esw = 0x7fd9be8fe048,
    group_id = 0xff000000(post - meter), priority = 4, fwd.type = pipe,
    empty_match, empty_actions_mask esw_mgr_port_id = 0,
    group_id = 0xf2000000(sample - post - mirror) esw = 0x7fd9be8fe048,
    group_id = 0xf2000000(sample - post - mirror), priority = 1,
    fwd.type = drop, match.outer.eth.type[2, changeable] = 0xffff / 0x8809,
    empty_actions_mask esw = 0x7fd9be8fe048,
    group_id = 0xf2000000(sample - post - mirror), priority = 3,
    fwd.type = pipe, empty_match,
    actions.meta.pkt_meta[4, changeable] =
        0xffffffff / 0x00f0ffff esw_mgr_port_id = 0,
    group_id = 0xf1000000(sample)esw_mgr_port_id = 0,
    group_id = 0xf3000000(miss)esw_mgr_port_id = 0,
    group_id = 0xfb000000(post - hash) esw = 0x7fd9be8fe048,
    group_id = 0xfb000000(post - hash), priority = 4, fwd.type = pipe,
    empty_match, empty_actions_mask

This command displays the created groups, where each group is identified by a group ID and includes a list of DOCA flow pipes arranged in a chain (with misses leading from one pipe to the next) and sorted by priority. These pipe groups are shown in the order they were created. Special group IDs are labeled (e.g., post-hash). The dump also includes the forwarding type for each pipe and any header rewrite actions, if applicable.

Debug Info Package

To improve troubleshooting of OVS crashes, ensure that coredumpctl is installed and properly configured on your system. This utility automates the collection of core dumps, which can be analyzed using gdb to extract backtraces and other relevant diagnostic information.

Core dumps are especially useful for investigating rare or hard-to-reproduce crashes, as they provide a complete snapshot of the process state at the time of failure—greatly aiding root cause analysis.

On Ubuntu systems, install it with:

Copy
Copied!
            

            
apt install systemd-coredump

Note

Make sure your system is configured to collect and retain core dumps. You may need to update /etc/systemd/coredump.conf and verify that settings such as ulimit and kernel parameters permit core dump generation.

OVS is compiled with libunwind support, so if a crash occurs, a backtrace may also be logged directly in the relevant log file. For example, if ovs-vswitchd crashes, a backtrace should appear in /var/log/openvswitch/ovs-vswitchd.log.

To allow core dumps:

Copy
Copied!
            

            
# ulimit -c unlimited
# sysctl -w fs.suid_dumpable=1

To analyze the core dump with symbols, install the debug info package:

  • RPM-based distributions:

    Copy
    Copied!
                
    
            
    # dnf install openvswitch-debuginfo

  • Debian-based distributions:

    Copy
    Copied!
                
    
            
    # apt install openvswitch-dbg

Scenarios

Failure to Start OVS

If OVS fails to start after enabling DOCA mode, it is often due to missing hugepage configuration.

Check the OVS log file at /var/log/openvswitch/ovs-vswitchd.log for additional details.

If hugepages are not configured, you may encounter the following error:

Copy
Copied!
            

            
2024-03-13T14:59:26.806Z|00025|dpdk|ERR|EAL: Cannot get hugepage information.


Failure to Add Port to Bridge

Port addition failures can result from several common misconfigurations.

DOCA/DPDK Not Initialized

  • You may see:

    Copy
    Copied!
                
    
            
    error: "could not open network device pf0 (Address family not supported by protocol)"

  • Resolution:

    Copy
    Copied!
                
    
            
    ovs-vsctl set o . other_config:doca-init=true
systemctl restart openvswitch-switch

eSwitch Manager (PF) Not Added

  • Error:

    Copy
    Copied!
                
    
            
    error: "could not add network device pf0vf0 to ofproto (Resource temporarily unavailable)"

  • Resolution: Add the PF (eSwitch manager) port to the OVS bridge before adding its associated VFs.

Missing datapath_type=netdev for DOCA/DPDK ports

  • Error:

    Copy
    Copied!
                
    
            
    error: "could not add network device eth2 to ofproto (Invalid argument)"

  • Explanation: When using DOCA/DPDK ports, the bridge must have datapath_type set to netdev.

  • Verify using:

    Copy
    Copied!
                
    
            
    ovs-vsctl get bridge <BR> datapath_type

Non-existent Port Specified

  • Error:

    Copy
    Copied!
                
    
            
    error: "rep1: could not set configuration (No such device)"

  • Resolution: Verify that the specified device exists and is visible to the system.

Traffic Failure

Failure to pass traffic between interfaces may be caused by the following issues:

  • Port not added successfully – Refer to Failure to Add Port to Bridge to ensure ports were added correctly.

  • Incorrect VF subnet configuration – If traffic is sent between VFs on different subnets, it will not be forwarded unless explicit OpenFlow rules are configured to permit inter-subnet routing.

  • Conflicting kernel routing table – Verify that the kernel's routing table does not contain overlapping routes. Each unique IP address should be associated with only one interface.

  • Missing VF representors on the OVS bridge – If a VF's representor is not attached to the bridge, traffic from that VF will not reach the OVS pipeline.

  • Tunnel misconfiguration:

    • Missing neighbor discovery between tunnel endpoints – For tunnel traffic to work, L3 connectivity between endpoints must be established.

      1. Ensure the OVS bridge has the correct local tunnel IP.

      2. Ensure the remote system has an interface configured with the corresponding remote tunnel IP.

    • Mismatched VNI configuration – Both systems must use the same VNI (VXLAN Network Identifier) for traffic to be correctly encapsulated and decapsulated.

Performance Degradation (No Offload)

If you experience performance degradation, it may indicate that OVS is not offloading flows to hardware as expected.

Verify Offload Status

Verify offload status. Run:

Copy
Copied!
            

            
# ovs-vsctl get Open_vSwitch . other_config:hw-offload

  • If hw-offload = true – Fast Path is enabled (offload is working)

  • If hw-offload = false – Slow Path is used (offload is disabled)

Enable Hardware Offload

  • For RHEL/CentOS, run:

    Copy
    Copied!
                
    
            
    # ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# systemctl restart openvswitch
# systemctl enable openvswitch

  • For Ubuntu/Debian:

    Copy
    Copied!
                
    
            
    # ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# systemctl restart openvswitch-switch

Check Offload Status of Rules

To verify which flows are offloaded:

Copy
Copied!
            

            
# ovs-appctl dpctl/dump-flows -m

  • If dp:ovs appears in the output, the flow was handled in software (offload failed).

  • Review the end of each flow entry or check the OVS logs to identify the reason for failure.

  • PMD (Poll Mode Driver) counters can also confirm if packets are being processed in software.

Consider ct-zone and mem-zone Usage

Performance issues may also arise due to resource exhaustion from connection tracking or memory zone limits.

  • OVS supports up to 65,535 ct-zones.

  • In DOCA basic pipe mode, each ct-zone may consume approximately 36 mem-zones.

  • If too many ct-zones are created, the system may run out of available mem-zones, which can impact offload and degrade performance.

Reaching Maximum Number of Memory Zones

Due to the increased mem-zone requirement per connection tracking (ct) zone, users may reach the maximum number of DPDK mem-zones more easily—especially when configuring a large number of ct-zones. By default, the mem-zone limit is set to 2560.

Error in Logs

When the mem-zone limit is reached, the following error will appear in the logs:

Copy
Copied!
            

            
2024-07-30T19:17:07.585Z|00002|dpdk(hw_offload4)|ERR|EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested memzone segments exceeds max memzone segments (2560 >= 2560)


Workaround

To resolve this issue, increase the number of mem-zones by setting the dpdk-max-memzones configuration parameter:

Copy
Copied!
            

            
ovs-vsctl set o . other_config:dpdk-max-memzones=<desired_number>

Replace <desired_number> with the total number of mem-zones required for your configuration.

Example Scenario

You are configuring 500 ct-zones. Since each ct-zone requires approximately 36 mem-zones, you will need a total of:

Copy
Copied!
            

            
500 ct-zones × 36 mem-zones/ct-zone = 18,000 mem-zones

It is recommended to reserve additional mem-zones for other pipeline components. For example, you can preserve the default 2560 mem-zones for general system use.

Total required mem-zones:

Copy
Copied!
            

            
18,000 (for ct-zones) + 2,560 (reserved) = 20,560

Set the value:

Copy
Copied!
            

            
ovs-vsctl set o . other_config:dpdk-max-memzones=20560

By adjusting the mem-zone limit accordingly, you can avoid allocation failures and performance degradation caused by resource exhaustion—especially in environments with large-scale connection tracking configurations.
