DOCA Documentation v2.8.0
DOCA 2.8.0

OVS-DOCA Hardware Acceleration

OVS-DOCA is designed on top of NVIDIA's networking API to preserve the same OpenFlow, CLI, and data interfaces (e.g., vdpa, VF passthrough), as well as datapath offloading APIs, also known as OVS-DPDK and OVS-Kernel. While all OVS flavors make use of flow offloads for hardware acceleration, due to its architecture and use of DOCA libraries, the OVS-DOCA mode provides the most efficient performance and feature set among them, making the most out of NVIDA NICs and DPUs.

ovs-doca-arch-version-1-modificationdate-1718287678800-api-v2.png

The following subsections provide the necessary steps to launch/deploy OVS DOCA.

To configure OVS DOCA HW offloads:

  1. Unbind the VFs:

    Copy
    Copied!
                

    echo 0000:04:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind echo 0000:04:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind

    Note

    VMs with attached VFs must be powered off to be able to unbind the VFs.

  2. Change the e-switch mode from legacy to switchdev on the PF device (make sure all VFs are unbound):

    Copy
    Copied!
                

    echo switchdev > /sys/class/net/enp4s0f0/compat/devlink/mode

    Note

    This command also creates the VF representor netdevices in the host OS.

    To revert to SR-IOV legacy mode:

    Copy
    Copied!
                

    echo legacy > /sys/class/net/enp4s0f0/compat/devlink/mode

  3. Bind the VFs:

    Copy
    Copied!
                

    echo 0000:04:00.2 > /sys/bus/pci/drivers/mlx5_core/bind echo 0000:04:00.3 > /sys/bus/pci/drivers/mlx5_core/bind

  4. Configure huge pages:

    Copy
    Copied!
                

    mkdir -p /hugepages mount -t hugetlbfs hugetlbfs /hugepages echo 4096 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

  5. Run the Open vSwitch service:

    Copy
    Copied!
                

    systemctl start openvswitch

  6. Enable DOCA mode and hardware offload (disabled by default):

    Copy
    Copied!
                

    ovs-vsctl --no-wait set Open_vSwitch . other_config:doca-init=true ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

  7. Restart the Open vSwitch service.

    Copy
    Copied!
                

    systemctl restart openvswitch

    Info

    This step is required for HW offload changes to take effect.

  8. Create OVS-DOCA bridge:

    Copy
    Copied!
                

    ovs-vsctl --no-wait add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev

  9. Add PF to OVS:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs enp4s0f0 -- set Interface enp4s0f0 type=dpdk

  10. Add representor to OVS:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs enp4s0f0_0 -- set Interface enp4s0f0_0 type=dpdk

    Info

    The legacy option to add DPDK ports without using a related netdev by providing dpdk-devargs still exists:

    1. Add a PF port:

      Copy
      Copied!
                  

      ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=0000:88:00.0

    2. Add a VF representor port:

      Copy
      Copied!
                  

      ovs-vsctl add-port br0-ovs representor -- set Interface representor type=dpdk options:dpdk-devargs=0000:88:00.0,representor=[0]

    3. Add a SF representor port:

      Copy
      Copied!
                  

      ovs-vsctl add-port br0-ovs representor -- set Interface representor type=dpdk options:dpdk-devargs=0000:88:00.0,representor=sf[0]

    4. Add a BlueField host PF representor port:

      Copy
      Copied!
                  

      ovs-vsctl add-port br0-ovs hpf -- set Interface hpf type=dpdk options:dpdk-devargs=0000:88:00.0,representor=[65535]

  11. Optional configuration:

    1. To set port MTU, run:

      Copy
      Copied!
                  

      ovs-vsctl set interface enp4s0f0 mtu_request=9000

      Note

      OVS restart is required for changes to take effect .

    2. To set VF/SF MAC, run:

      Copy
      Copied!
                  

      ovs-vsctl add-port br0-ovs enp4s0f0 -- set Interface enp4s0f0 type=dpdk options:dpdk-vf-mac=00:11:22:33:44:55

      Note

      Unbinding and rebinding the VFs/SFs is required for the change to take effect .

OVS-DOCA shares most of its structure with OVS-DPDK. To benefit from the DOCA offload design, some of the behavior of userland datapath and ports are however modified.

Eswitch Dependency

Configured in switchdev mode, the physical port and all supported functions share a single general domain to execute the offloaded flows, the eswitch.

All ports on the same eswitch are dependent on its physical function. If this main physical function is deactivated (e.g., removed from OVS or its link set down), dependent ports are disabled as well.

Pre-allocated Offload Tables

To offer the highest insertion speed, DOCA offloads pre-allocate offload structures (entries and containers).

When starting the vSwitch daemon, offloads are thus configured with sensible defaults. If different numbers of offloads are required, configuration entries specific to OVS-DOCA are available and are described in the next section.

Unsupported CT-CT-NAT

The special ct-ct-nat mode that can be configured in OVS-kernel and OVS-DPDK is not supported by OVS-DOCA.

The following configuration is particularly useful or specific to OVS-DOCA mode.

Info

The full list of OVS vSwitch configuration is documented in man ovs-vswitchd.conf.db.

other_config

The following table provides other_config configurations which are global to the vSwitch (non-exhaustive list, check manpage for more):

Configuration

Description

other_config:doca-init

  • Optional string, either true or false

  • Set this value to true to enable DOCA Flow HW offload

  • The default value is false. Changing this value requires restarting the daemon.

  • This is only relevant for userspace datapath

other_config:hw-offload-ct-size

  • Optional string, containing an integer, at least 0

  • Only for the DOCA offload provider on netdev datapath

  • Configure the usable amount of connection tracking (CT) offload entries

  • The default value is 250000. Changing this value requires restarting the daemon.

  • Setting a value of 0 disables CT offload

  • Changing this configuration affects the OVS memory usage as CT tables are allocated on OVS start

  • Maximum number of supported connections is 2M

    Warning

    Setting this parameter to more than 2M might result in failures.

other_config:hw-offload-ct-ipv6-enabled

  • Optional string, either true or false

  • Only for the DOCA offload provider on netdev datapath

  • Set this value to true to enable IPv6 CT offload

  • The default value is false. Changing this value requires restarting the daemon.

  • Changing this configuration affects the OVS memory usage as CT tables are allocated on OVS start

other_config:doca-congestion-threshold

  • Optional string, containing an integer, in range 30 to 90

  • The occupancy rate of DOCA offload structures that triggers a resize, as a percentage

  • Default to 80, but only relevant if other_config:doca-init is true. Changing this value requires restarting the daemon.

other_config:ctl-pipe-size

  • Optional string, containing an integer

  • The initial size of DOCA control pipes

  • Default to 0, which is DOCA's internal default value

other_config:ctl-pipe-infra-size

  • Optional string, containing an integer

  • The initial size of infrastructure DOCA control pipes: root, post-hash, post-ct, post-meter, split, miss.

  • Default to 0, which fallbacks to other_config:ctl-pipe-size

other_config:pmd-quiet-idle

  • Optional string, either true or false

  • Allow the PMD threads to go into quiescent mode when idling. If no packets are received or waiting to be processed and sent, enter a continuous quiescent period. End this period as soon as a packet is received.

  • This option is disabled by default

other_config:pmd-maxsleep

  • Optional string, containing an integer, in range 0 to 10,000

  • Specifies the maximum sleep time in microseconds per iteration for a PMD thread which has received zero or a small amount of packets from the Rx queues it is polling.

  • The actual sleep time requested is based on the load of the Rx queues that the PMD polls and may be less than the maximum value

  • The default value is 0 microseconds, which means that the PMD does not sleep regardless of the load from the Rx queues that it polls

  • To avoid requesting very small sleeps (e.g., less than 10 µs) the value is rounded up to the nearest 10 µs

  • The maximum value is 10000 microseconds.

other_config:dpdk-max-memzones

  • Optional string, containing an integer

  • Specifies the maximum number of memzones that can be created in DPDK

  • The default is empty, keeping DPDK’s default. Changing this value requires restarting the daemon.

other_config:pmd-cpu-mask

With PMD multi-threading support, OVS creates one PMD thread for each NUMA node by default if there is at least one DPDK interface added to OVS from that NUMA node. However, in cases where there are multiple ports/rxqs producing traffic, performance can be improved by creating multiple PMD threads running on separate cores. These PMD threads can share the workload by each being responsible for different ports/rxqs. Assignment of ports/rxqs to PMD threads is done automatically.

A set bit in the mask means a PMD thread is created and pinned to the corresponding CPU core. For example, to run PMD threads on cores 1 and 2, run:

Copy
Copied!
            

$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6


netdev-dpdk

The following table provides netdev-dpdk configurations which only userland (DOCA or DPDK) netdevs support (non-exhaustive list, check manpage for more):

Configuration

Description

options:iface-name

  • Specifies the interface name of the port

  • Providing this option accelerates processing the port reconfiguration by querying the sysfs to check if the interface exists before DPDK attempts to probe the port


vSwitch in userspace rather than kernel-based Open vSwitch requires an additional bridge. The purpose of this bridge is to allow use of the kernel network stack for routing and ARP resolution.

The datapath must look up the routing table and ARP table to prepare the tunnel header and transmit data to the output port.

VXLAN encapsulation/decapsulation offload configuration is done with:

  • PF on 0000:03:00.0 PCIe and MAC 98:03:9b:cc:21:e8

  • Local IP 56.56.67.1 – the br-phy interface is configured to this IP

  • Remote IP 56.56.68.1

To configure OVS DOCA VXLAN:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone other_config:hwaddr=98:03:9b:cc:21:e8

  2. Attach PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy enp4s0f0 -- set Interface enp4s0f0 type=dpdk

  3. Configure IP to the bridge:

    Copy
    Copied!
                

    ip addr add 56.56.67.1/24 dev br-phy

  4. Create a br-ovs bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br-ovs -- set Bridge br-ovs datapath_type=netdev -- br-set-external-id br-ovs bridge-id br-ovs -- set bridge br-ovs fail-mode=standalone

  5. Attach representor to br-ovs:

    Copy
    Copied!
                

    ovs-vsctl add-port br-ovs enp4s0f0_0 -- set Interface enp4s0f0_0 type=dpdk

  6. Add a port for the VXLAN tunnel:

    Copy
    Copied!
                

    ovs-vsctl add-port ovs-sriov vxlan0 -- set interface vxlan0 type=vxlan options:local_ip=56.56.67.1 options:remote_ip=56.56.68.1 options:key=45 options:dst_port=4789

VXLAN GBP Extension

The VXLAN group-based policy (GBP) model outlines an application-focused policy framework that specifies connectivity requirements for applications, independent of the network's physical layout.

Setting GBP extension for a VXLAN port allows for matching on and setting a GBP ID per flow. To enable GBP extension when the port vxlan0 is first added:

Copy
Copied!
            

ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 type=vxlan options:key=30 options:remote_ip=10.0.30.1 options:exts=gbp

It is also possible to enable GBP extension for an existing VXLAN port:

Copy
Copied!
            

ovs-vsctl set interface vxlan1 options:exts=gbp

This approach has a limitation that it does not take effect until after the OVS vswitchd service is restarted. In cases where there are multiple VXLAN ports, they must all share the same GBP extension configuration in their port options. A mixed configuration with some VXLAN ports having the GBP extension enabled and others disabled is not supported.

When GBP extension is enabled, the following OpenFlow rules which match on a GBP ID 32 or set a GBP ID 64 in the actions, can be offloaded:

Copy
Copied!
            

ovs-ofctl add-flow br-int table=0,priority=100,in_port=vxlan0,tun_gbp_id=32 actions=output:pf0vf0 ovs-ofctl add-flow br-int table=0,priority=100,in_port=pf0vf0 actions=load:64->NXM_NX_TUN_GBP_ID[],output:vxlan0


Connection tracking enables stateful packet processing by keeping a record of currently open connections.

OVS flows utilizing connection tracking can be accelerated using advanced NICs by offloading established connections.

To view offload statistics, run:

Copy
Copied!
            

ovs-appctl dpctl/offload-stats-show

To configure OVS-DOCA SR-IOV VF LAG:

  1. Enable SR-IOV on the NICs:

    Copy
    Copied!
                

    // It is recommended to query the parameters first to determine if a change is needed, to save potentially unnecessary reboot. mst start mlxconfig -d <mst device> -y set PF_NUM_OF_VF_VALID=0 SRIOV_EN=1 NUM_OF_VFS=8

    Note

    If configuration did change, perform a BlueField system reboot for the mlxconfig settings to take effect.

  2. Allocate the desired number of VFs per port:

    Copy
    Copied!
                

    echo $n > /sys/class/net/<net name>/device/sriov_numvfs

  3. Unbind all VFs:

    Copy
    Copied!
                

    echo <VF PCI> >/sys/bus/pci/drivers/mlx5_core/unbind

  4. Change both NICs' mode to SwitchDev:

    Copy
    Copied!
                

    devlink dev eswitch set pci/<PCI> mode switchdev

  5. Create Linux bonding using kernel modules:

    Copy
    Copied!
                

    modprobe bonding mode=<desired mode>

    Note

    Other bonding parameters can be added here. The supported bond modes are Active-Backup, XOR, and LACP.

  6. Bring all PFs and VFs down:

    Copy
    Copied!
                

    ip link set <PF/VF> down

  7. Attach both PFs to the bond:

    Copy
    Copied!
                

    ip link set <PF> master bond0

  8. Bring PFs and bond link up:

    Copy
    Copied!
                

    ip link set <PF0> up ip link set <PF1> up ip link set bond0 up

  9. Add the bond interface to the bridge as type=dpdk:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy bond0 -- set Interface bond0 type=dpdk options:dpdk-lsc-interrupt=true

    Info

    The legacy option to work with VF-LAG in OVS-DPDK is to add the bond master (PF) interface to the bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy p0 -- set Interface p0 type=dpdk options:dpdk-devargs=<PF0-PCI>,dv_flow_en=2,dv_xmeta_en=4 options:dpdk-lsc-interrupt=true

  10. Add representor of PF0 or PF1 to a bridge :

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy enp4s0f0_0 -- set Interface enp4s0f0_0 type=dpdk

    Or:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy enp4s0f1_0 -- set Interface enp4s0f1_0 type=dpdk

    Info

    The legacy option to add DPDK ports:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0-PCI>,representor=pf0vf$N,dv_flow_en=2,dv_xmeta_en=4

    Or:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0-PCI>,representor=pf1vf$N,dv_flow_en=2,dv_xmeta_en=4

Multiport eswitch mode allows adding rules on a VF representor with an action, forwarding the packet to the physical port of the physical function. This can be used to implement failover or to forward packets based on external information such as the cost of the route.

  1. To configure multiport eswitch mode , the nvconig parameter LAG_RESOURCE_ALLOCATION=1 must be set in the BlueField Arm OS, according to the following instructions:

    Copy
    Copied!
                

    mst start mlxconfig -d /dev/mst/mt*conf0 -y s LAG_RESOURCE_ALLOCATION=1

  2. Perform a BlueField system reboot for the mlxconfig settings to take effect.

  3. After the driver loads, and before moving to switchdev mode, configure multiport eswitch for each PF where p0 and p1 represent the netdevices for the PFs:

    Copy
    Copied!
                

    devlink dev param set pci/0000:03:00.0 name esw_multiport value 1 cmode runtime devlink dev param set pci/0000:03:00.1 name esw_multiport value 1 cmode runtime

    Info

    The mode becomes operational after entering switchdev mode on both PFs.

  4. This mode can be activated by default in BlueField by adding the following line into /etc/mellanox/mlnx-bf.conf:

    Copy
    Copied!
                

    ENABLE_ESWITCH_MULTIPORT="yes"

While in this mode, the second port is not an eswitch manager, and should be add to OVS using this command:

Copy
Copied!
            

ovs-vsctl add-port br-phy enp4s0f1 -- set interface enp4s0f1 type=dpdk

VFs for the second port can be added using this command:

Copy
Copied!
            

ovs-vsctl add-port br-phy enp4s0f1_0 -- set interface enp4s0f1_0 type=dpdk

Info

The legacy option to add DPDK ports:

Copy
Copied!
            

ovs-vsctl add-port br-phy p1 -- set interface p1 type=dpdk options:dpdk-devargs="0000:08:00.0,dv_xmeta_en=4,dv_flow_en=2,representor=pf1

VFs for the second port can be added using this command:

Copy
Copied!
            

ovs-vsctl add-port br-phy p1vf0 -- set interface p1 type=dpdk options:dpdk-devargs="0000:08:00.0,dv_xmeta_en=4,dv_flow_en=2,representor=pf1vf0

Geneve tunneling offload support includes matching on extension header.

Note

OVS-DOCA Geneve option limitations:

  • Only 1 Geneve option is supported

  • Max option len is 7

  • To change the Geneve option currently being matched and encapsulated, users must remove all ports or restart OVS and configure the new option

  • Matching on Geneve options can work with FLEX_PARSER profile 0 (the default profile). Working with FLEX_PARSER profile 8 is also supported as well. To configure it, run:

    Copy
    Copied!
                

    mst start mlxconfig -d <mst device> s FLEX_PARSER_PROFILE_ENABLE=8

    Note

    Perform a BlueField system reboot for the mlxconfig settings to take effect.

To configure OVS-DOCA Geneve encapsulation/decapsulation:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach a PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy enp4s0f0 -- set Interface enp4s0f0 type=dpdk

  3. Configure an IP to the bridge:

    Copy
    Copied!
                

    ifconfig br-phy <$local_ip_1> up

  4. Create a br-int bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int enp4s0f0_0 -- set Interface enp4s0f0_0 type=dpdk

  5. Attach a representor to br-int:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int rep$x -- set Interface rep$x type=dpdk options:dpdk-devargs=<PF PCI>,representor=[$x],dv_flow_en=2,dv_xmeta_en=4

  6. Add a port for the Geneve tunnel:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int geneve0 -- set interface geneve0 type=geneve options:key=<VNI> options:remote_ip=<$remote_ip_1> options:local_ip=<$local_ip_1>

To configure OVS-DOCA GRE encapsulation/decapsulation:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach a PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy enp4s0f0 -- set Interface enp4s0f0 type=dpdk

  3. Configure an IP to the bridge:

    Copy
    Copied!
                

    ifconfig br-phy <$local_ip_1> up

  4. Create a br-int bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-int -- set Bridge br-int datapath_type=netdev -- br-set-external-id br-int bridge-id br-int -- set bridge br-int fail-mode=standalone

  5. Attach a representor to br-int:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int enp4s0f0_0 -- set Interface enp4s0f0_0 type=dpdk

Add a port for the Geneve tunnel:

Copy
Copied!
            

ovs-vsctl add-port br-int gre0 -- set interface gre0 type=gre options:key=<VNI> options:remote_ip=<$remote_ip_1> options:local_ip=<$local_ip_1>

Slow path rate limiting allows controlling the rate of traffic that bypasses hardware offload rules and is subsequently processed by software.

To configure slow path rate limiting:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach a PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy pf0 -- set Interface pf0 type=dpdk

  3. Rate limit pf0vf0 to 10Kpps with 6K burst size:

    Copy
    Copied!
                

    ovs-vsctl set interface pf0 options:sw-meter=pps:10k:6k

  4. Restart OVS:

    Copy
    Copied!
                

    systemctl restart openvswitch-switch.service

A dry-run option is also supported to allow testing different software meter configurations in a production environment. This allows gathering statistics without impacting the actual traffic flow. These statistics can then be analyzed to determine appropriate rate limiting thresholds. When the dry-run option is enabled, traffic is not dropped or rate-limited, allowing normal operations to continue without disruption. However, the system simulates the rate limiting process and increment counters as though packets are being dropped.

To enable slow path rate limiting dry-run:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach a PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy pf0 -- set Interface pf0 type=dpdk

  3. Rate limit pf0vf0 to 10Kpps with 6K burst size:

    Copy
    Copied!
                

    ovs-vsctl set interface pf0 options:sw-meter=pps:10k:6k

  4. Set the sw-meter-dry-run option:

    Copy
    Copied!
                

    ovs-vsctl set interface pf0vf0 options:sw-meter-dry-run=true

  5. Restart OVS:

    Copy
    Copied!
                

    systemctl restart openvswitch-switch.service

Hairpin allows forwarding packets from wire to wire.

To configure hairpin :

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach a PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy pf0 -- set Interface pf0 type=dpdk

  3. Add hairpin OpenFlow rule:

    Copy
    Copied!
                

    ovs-ofctl add-flow br-phy"in_port=pf0,ip,actions=in_port"

OVS-DOCA supports OpenFlow meter action as covered in this document in section "OpenFlow Meters". In addition, OVS-DOCA supports chaining multiple meter actions together in a single datapth rule.

The following is an example configuration of such OpenFlow rules:

Copy
Copied!
            

ovs-ofctl add-flow br-phy -O OpenFlow13 "table=0,priority=1,in_port=pf0vf0_r,ip actions=meter=1,resubmit(,1)" ovs-ofctl add-flow br-phy -O OpenFlow13 "table=1,priority=1,in_port=pf0vf0_r,ip actions=meter=2,normal"

Meter actions are applied sequentially, first using meter ID 1 and then using meter ID 2.

Use case examples for such a configuration:

  • Rate limiting the same logical flow with different meter types—bytes per second and packets per second

  • Metering a group of flows. As meter IDs can be used by multiple flows, it is possible to re-use meter ID 2 from this example with other logical flows; thus, making sure that their cumulative bandwidth is limited by the meter.

OVS supports group configuration. The "select" type executes one bucket in the group, balancing across the buckets according to their weights. To select a bucket, for each live bucket, OVS hashes flow data with the bucket ID and multiplies that by the bucket weight to obtain a "score". The bucket with the highest score is selected.

Info

For more details, refer to the ovs-ofctl man.

For example:

  • ovs-ofctl add-group br-int 'group_id=1,type=select,bucket=<port1>'

  • ovs-ofctl add-flow br-int in_port=<port0>,actions=group=1

Limitations:

  • Offloads are supported on IP traffic only (IPv4 or IPv6)

The sFlow standard outlines a method for capturing traffic data in switched or routed networks. It employs sampling technology to gather statistics from the device, making it suitable for high-speed networks.

With a predetermined sampling rate, one out of every N packets is captured. While this sampling method does not yield completely accurate results, it does offer acceptable accuracy.

To activate sampling for 0.2% of all traffic traversing an OVS bridge named br-int, run:

Copy
Copied!
            

ovs-vsctl -- --id=@sflow create sflow agent=lo target=127.0.0.1:6343 header=96 sampling=512 -- set bridge br-int sflow=@sflow

With this sFlow configuration on the bridge, captured packets are mirrored to an sFlow collector application that listens on the default sFlow port, 6343, on localhost.

Info

sFlow collector applications fall outside the scope of this guide.

It is possible to set the sampling rate to 1 while configuring sFlow on a bridge, which effectively mirrors all traffic to the sFlow collector.

  • Only one insertion thread is supported (n-offload-threads=1)

  • Only 250K connection are offloadable by default (can be configured)

    Note

    The maximum number of supported connections is 2M.

  • Only 8 CT zones are supported by CT offload

  • When using two PFs with 127 VFs each and adding their representors to OVS bridge, the user must configure dpdk-memzones:

    Copy
    Copied!
                

    ovs-vsctl set o . other_config:dpdk-max-memzones=6500 restart ovs

  • In an OVS topology that includes both physical and internal bridges, sFlow offloads are only supported on the internal bridge when employing a VXLAN tunnel. Utilizing sFlow on the physical bridge leads to only partial offload of flows in this scenario.

Additional debugging information can be enabled in the vSwitch log file using the dbg log level:

Copy
Copied!
            

( topics='netdev|ofproto|ofp|odp|doca' IFS=$'\n'; for topic in $(ovs-appctl vlog/list | grep -E "$topics" | cut -d' ' -f1) do printf "$topic:file:dbg " done ) | xargs ovs-appctl vlog/set

The listed topics are relevant to DOCA offload operations.

Coverage counters specific to the DOCA offload provider have been added. The following command should be used to check them:

Copy
Copied!
            

ovs-appctl coverage/show # Print the current non-zero coverage counters

The following table provides the meaning behind these DOCA-specific counters:

Counter

Description

doca_async_queue_full

The asynchronous offload insertion queue was full while the daemon attempted to insert a new offload.

The queue will have been flushed and insertion attempted again.

This is not a fatal error but is the sign of a slowed down hardware.

doca_async_queue_blocked

The asynchronous offload insertion queue has remained full even after several attempts to flush its currently enqueued requests.

While not a fatal error, it should never happen during normal offload operations and should be considered a bug.

doca_async_add_failed

An asynchronous insertion failed specifically due to its asynchronous nature. This is not expected to happen and should be considered a bug.

doca_pipe_resize

The number of time a DOCA pipe has been resized. This is normal and expected as DOCA pipes receives more entries.

doca_pipe_resize_over_10_ms

A DOCA pipe resize took longer than 10ms to complete. It can happen infrequently.

If a sudden drop in insertion rate is measured, this counter could help identify the root cause.

To build OVS-DOCA from provided sources and pre-installed DOCA with the same version packages, run:

Copy
Copied!
            

$ ./boot.sh $ ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --with-dpdk=static --with-doca=static $ make -j 10 $ make install

A helper build script is bundled with OVS-DOCA sources that can be used as follows:

Copy
Copied!
            

$ ./build.sh --install-ovs

Megaflows aggregate multiple microflows into a single flow entry, reduce the load on the flow table, and improve packet processing efficiency. Scaling megaflows in OVS is crucial for optimizing network performance and ensuring efficient handling of high traffic volumes. By default, OVS-DOCA can handle up to 200k megaflows.

To effectively manage and scale megaflows, several key configurations in the other_config section of OVS can be adjusted:

  • The flow-limit parameter sets the maximum number of flows that can be stored in the flow table, helping to control memory usage and prevent overflow.

  • The max-revalidator parameter defines the longest duration (in milliseconds) that re-validator threads will wait before initiating flow revalidation. It is crucial to understand that this represents the upper limit, and the actual timeout employed by OVS is the lesser of the max-idle and max-revalidator values. Modifying this parameter is generally not recommended without a thorough understanding of its effects. For systems with less powerful CPUs, setting a higher max-revalidator value is suggested to compensate for reduced computational capacity and ensure revalidation completes.

Fine-tuning these settings can improve the scalability and performance of an OVS deployment, allowing it to manage a greater number of megaflows efficiently.

  • To set flow-limit (default is 200k):

    Copy
    Copied!
                

    $ ovs-vsctl set o . other_config:flow-limit=<desired_value>

  • To set max-revalidator (default is 250ms).

    Copy
    Copied!
                

    $ ovs-vsctl set o . other_config:max-revalidator=<desired_value>

© Copyright 2024, NVIDIA. Last updated on Aug 21, 2024.