OVS-DPDK Hardware Offloads

1.0

To configure OVS-DPDK HW offloads:

  1. Unbind the VFs:

    Copy
    Copied!
                

    echo 0000:04:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind echo 0000:04:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind

    Note

    VMs with attached VFs must be powered off to be able to unbind the VFs.

  2. Change the e-switch mode from legacy to switchdev on the PF device (make sure all VFs are unbound). This also creates the VF representor netdevices in the host OS.

    Copy
    Copied!
                

    echo switchdev > /sys/class/net/enp4s0f0/compat/devlink/mode

    To revert to SR-IOV legacy mode:

    Copy
    Copied!
                

    echo legacy > /sys/class/net/enp4s0f0/compat/devlink/mode

    Note

    This command removes the VF representor netdevices.

  3. Bind the VFs:

    Copy
    Copied!
                

    echo 0000:04:00.2 > /sys/bus/pci/drivers/mlx5_core/bind echo 0000:04:00.3 > /sys/bus/pci/drivers/mlx5_core/bind

  4. Run the OVS service:

    Copy
    Copied!
                

    systemctl start openvswitch

  5. Enable hardware offload (disabled by default):

    Copy
    Copied!
                

    ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

  6. Configure the DPDK whitelist:

    Copy
    Copied!
                

    ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra="-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=1"

    Where representor=[0-N].

  7. Restart the OVS service:

    Copy
    Copied!
                

    systemctl restart openvswitch

    Info

    This step is required for the hardware offload changes to take effect.

  8. Create OVS-DPDK bridge:

    Copy
    Copied!
                

    ovs-vsctl --no-wait add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev

  9. Add PF to OVS:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=0000:88:00.0

  10. Add representor to OVS:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs representor -- set Interface representor type=dpdk options:dpdk-devargs=0000:88:00.0,representor=[0]

    Where representor=[0-N].

vSwitch in userspace requires an additional bridge. The purpose of this bridge is to allow use of the kernel network stack for routing and ARP resolution.

The datapath must look up the routing table and ARP table to prepare the tunnel header and transmit data to the output port.

Configuring VXLAN Encap/Decap Offloads

Note

The configuration is done with:

  • PF on 0000:03:00.0 PCIe and MAC 98:03:9b:cc:21:e8

  • Local IP 56.56.67.1 – br-phy interface is configured to this IP

  • Remote IP 56.56.68.1

To configure OVS-DPDK VXLAN:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone other_config:hwaddr=98:03:9b:cc:21:e8

  2. Attach PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy p0 -- set Interface p0 type=dpdk options:dpdk-devargs=0000:03:00.0

  3. Configure IP to the bridge:

    Copy
    Copied!
                

    ip addr add 56.56.67.1/24 dev br-phy

  4. Create a br-ovs bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br-ovs -- set Bridge br-ovs datapath_type=netdev -- br-set-external-id br-ovs bridge-id br-ovs -- set bridge br-ovs fail-mode=standalone

  5. Attach representor to br-ovs:

    Copy
    Copied!
                

    ovs-vsctl add-port br-ovs pf0vf0 -- set Interface pf0vf0 type=dpdk options:dpdk-devargs=0000:03:00.0,representor=[0]

  6. Add a port for the VXLAN tunnel:

    Copy
    Copied!
                

    ovs-vsctl add-port ovs-sriov vxlan0 -- set interface vxlan0 type=vxlan options:local_ip=56.56.67.1 options:remote_ip=56.56.68.1 options:key=45 options:dst_port=4789

CT enables stateful packet processing by keeping a record of currently open connections. OVS flows using CT can be accelerated using advanced NICs by offloading established connections.

To view offloaded connections, run:

Copy
Copied!
            

ovs-appctl dpctl/offload-stats-show

To configure OVS-DPDK SR-IOV VF LAG:

  1. Enable SR-IOV in the NIC firmware:

    Copy
    Copied!
                

    // It is recommended to query the parameters first to determine if change is needed, to save unnecessary reboot mst start mlxconfig -d <mst device> -y set PF_NUM_OF_VF_VALID=0 SRIOV_EN=1 NUM_OF_VFS=8

    If configuration changes were made, unless the NIC is BlueField DPU Mode, perform a warm reboot of the Server OS. Otherwise, please perform BlueField System-Level Reset.

  2. Allocate the desired number of VFs per port:

    Copy
    Copied!
                

    echo $n > /sys/class/net/<net name>/device/sriov_numvfs

  3. Unbind all VFs:

    Copy
    Copied!
                

    echo <VF PCI> >/sys/bus/pci/drivers/mlx5_core/unbind

  4. Change both devices' mode to switchdev:

    Copy
    Copied!
                

    devlink dev eswitch set pci/<PCI> mode switchdev

  5. Create Linux bonding using kernel modules:

    Copy
    Copied!
                

    modprobe bonding mode=<desired mode>

    Info

    Other bonding parameters can be added here. The supported bond modes are: Active-backup, XOR and LACP.

  6. Bring all PFs and VFs down:

    Copy
    Copied!
                

    ip link set <PF/VF> down

  7. Attach both PFs to the bond:

    Copy
    Copied!
                

    ip link set <PF> master bond0

  8. To use VF-LAG with OVS-DPDK, add the bond master (PF) to the bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy p0 -- set Interface p0 type=dpdk options:dpdk-devargs=0000:03:00.0 options:dpdk-lsc-interrupt=true

  9. Add representor $N of PF0 or PF1 to a bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0 PCI>,representor=pf0vf$N

    Or:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0 PCI>,representor=pf1vf$N

Note

Hardware vDPA is enabled by default. If your hardware does not support vDPA, the driver will fall back to Software vDPA.

To check which vDPA mode is activated on your driver, run: ovs-ofctl -O OpenFlow14 dump-ports br0-ovs and look for hw-mode flag.

Note

This feature has not been accepted to the OVS-DPDK upstream yet, making its API subject to change.

In user space, there are two main approaches for communicating with a guest (VM), either through SR-IOV or virtio.

PHY ports (SR-IOV) allow working with port representor, which is attached to the OVS and a matching VF is given with pass-through to the guest. HW rules can process packets from up-link and direct them to the VF without going through SW (OVS). Therefore, using SR-IOV achieves the best performance.

However, SR-IOV architecture requires the guest to use a driver specific to the underlying HW. Specific HW driver has two main drawbacks:

  • Breaks virtualization in some sense (guest is aware of the HW). It can also limit the type of images supported.

  • Gives less natural support for live migration.

Using a virtio port solves both problems, however, it reduces performance and causes loss of some functionalities, such as, for some HW offloads, working directly with virtio. The netdev type dpdkvdpa solves this conflict as it is similar to the regular DPDK netdev yet introduces several additional functionalities.

dpdkvdpa translates between the PHY port to the virtio port. It takes packets from the Rx queue and sends them to the suitable Tx queue, and allows transfer of packets from the virtio guest (VM) to a VF and vice-versa, benefitting from both SR-IOV and virtio.

To add a vDPA port:

Copy
Copied!
            

ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \ options:vdpa-socket-path=<sock path> \ options:vdpa-accelerator-devargs=<vf pci id> \ options:dpdk-devargs=<pf pci id>,representor=[id] \ options: vdpa-max-queues =<num queues> \ options: vdpa-sw=<true/false>

Note

vdpa-max-queues is an optional field. When the user wants to configure 32 vDPA ports, the maximum queues number is limited to 8.

vDPA Configuration in OVS-DPDK Mode

Prior to configuring vDPA in OVS-DPDK mode, perform the following:

  1. Generate the VF:

    Copy
    Copied!
                

    echo 0 > /sys/class/net/enp175s0f0/device/sriov_numvfs echo 4 > /sys/class/net/enp175s0f0/device/sriov_numvfs

  2. Unbind each VF:

    Copy
    Copied!
                

    echo <pci> > /sys/bus/pci/drivers/mlx5_core/unbind

  3. Switch to switchdev mode:

    Copy
    Copied!
                

    echo switchdev >> /sys/class/net/enp175s0f0/compat/devlink/mode

  4. Bind each VF:

    Copy
    Copied!
                

    echo <pci> > /sys/bus/pci/drivers/mlx5_core/bind

  5. Initialize OVS:

    Copy
    Copied!
                

    ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true

To configure vDPA in OVS-DPDK mode:

  1. OVS configuration:

    Copy
    Copied!
                

    ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra="-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=1" /usr/share/openvswitch/scripts/ovs-ctl restart

  2. Create OVS-DPDK bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=0000:01:00.0

  3. Create vDPA port as part of the OVS-DPDK bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=0000:01:00.2 options:dpdk-devargs=0000:01:00.0,representor=[0] options: vdpa-max-queues=8

To configure vDPA in OVS-DPDK mode on BlueField DPUs, set the bridge with the software or hardware vDPA port:

  • To create the OVS-DPDK bridge on the Arm side:

    Copy
    Copied!
                

    ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=0000:af:00.0 ovs-vsctl add-port br0-ovs rep-- set Interface rep type=dpdk options:dpdk-devargs=0000:af:00.0,representor=[0]

  • To create the OVS-DPDK bridge on the host side:

    Copy
    Copied!
                

    ovs-vsctl add-br br1-ovs -- set bridge br1-ovs datapath_type=netdev protocols=OpenFlow14 ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=0000:af:00.2

    Note

    To configure SW vDPA, add options:vdpa-sw=true to the command.

Software vDPA Configuration in OVS-Kernel Mode

Software vDPA can also be used in configurations where hardware offload is done through TC and not DPDK.

  1. OVS configuration:

    Copy
    Copied!
                

    ovs-vsctl set Open_vSwitch . other_config:dpdk-extra="-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=0,idv_xmeta_en=0,isolated_mode=1" /usr/share/openvswitch/scripts/ovs-ctl restart

  2. Create OVS-DPDK bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev

  3. Create vDPA port as part of the OVS-DPDK bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=0000:01:00.2 options:dpdk-devargs=0000:01:00.0,representor=[0] options: vdpa-max-queues=8

  4. Create Kernel bridge:

    Copy
    Copied!
                

    ovs-vsctl add-br br-kernel

  5. Add representors to Kernel bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-kernel enp1s0f0_0 ovs-vsctl add-port br-kernel enp1s0f0

To configure MTU/jumbo frames:

  1. Verify that the Kernel version on the VM is 4.14 or above:

    Copy
    Copied!
                

    cat /etc/redhat-release

  2. Set the MTU on both physical interfaces in the host:

    Copy
    Copied!
                

    ifconfig ens4f0 mtu 9216

  3. Send a large size packet and verify that it is sent and received correctly:

    Copy
    Copied!
                

    tcpdump -i ens4f0 -nev icmp & ping 11.100.126.1 -s 9188 -M do -c 1

  4. Enable host_mtu in XML and add the following values:

    Copy
    Copied!
                

    host_mtu=9216,csum=on,guest_csum=on,host_tso4=on,host_tso6=on

    Example:

    Copy
    Copied!
                

    <qemu:commandline> <qemu:arg value='-chardev'/> <qemu:arg value='socket,id=charnet1,path=/tmp/sock0,server'/> <qemu:arg value='-netdev'/> <qemu:arg value='vhost-user,chardev=charnet1,queues=16,id=hostnet1'/> <qemu:arg value='-device'/> <qemu:arg value='virtio-net-pci,mq=on,vectors=34,netdev=hostnet1,id=net1,mac=00:21:21:24:02:01,bus=pci.0,addr=0xC,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,host_mtu=9216,csum=on,guest_csum=on,host_tso4=on,host_tso6=on'/> </qemu:commandline>

  5. Add the mtu_request=9216 option to the OVS ports inside the container and restart the OVS:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=0000:c4:00.0 mtu_request=9216

    Or:

    Copy
    Copied!
                

    ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/tmp/sock0 options:vdpa-accelerator-devargs=0000:c4:00.2 options:dpdk-devargs=0000:c4:00.0,representor=[0] mtu_request=9216 /usr/share/openvswitch/scripts/ovs-ctl restart

  6. Start the VM and configure the MTU on the VM:

    Copy
    Copied!
                

    ifconfig eth0 11.100.124.2/16 up ifconfig eth0 mtu 9216 ping 11.100.126.1 -s 9188 -M do -c1

Note

This feature is supported at beta level.

OVS offload rules are based on a multi-table architecture. E2E cache enables merging the multi-table flow matches and actions into one joint flow.

This improves CT performance by using a single-table when an exact match is detected.

To set the E2E cache size (default is 4k):

Copy
Copied!
            

ovs-vsctl set open_vswitch . other_config:e2e-size=<size> systemctl restart openvswitch

To enable E2E cache (disabled by default):

Copy
Copied!
            

ovs-vsctl set open_vswitch . other_config:e2e-enable=true systemctl restart openvswitch

To run E2E cache statistics:

Copy
Copied!
            

ovs-appctl dpctl/dump-e2e-stats

To run E2E cache flows:

Copy
Copied!
            

ovs-appctl dpctl/dump-e2e-flows

Geneve tunneling offload support includes matching on extension header.

To configure OVS-DPDK Geneve encap/decap:

  1. Create a br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone

  2. Attach PF interface to br-phy bridge:

    Copy
    Copied!
                

    ovs-vsctl add-port br-phy pf -- set Interface pf type=dpdk options:dpdk-devargs=<PF PCI>

  3. Configure IP to the bridge:

    Copy
    Copied!
                

    ifconfig br-phy <$local_ip_1> up

  4. Create a br-int bridge:

    Copy
    Copied!
                

    ovs-vsctl --may-exist add-br br-int -- set Bridge br-int datapath_type=netdev -- br-set-external-id br-int bridge-id br-int -- set bridge br-int fail-mode=standalone

  5. Attach representor to br-int:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int rep$x -- set Interface rep$x type=dpdk options:dpdk-devargs=<PF PCI>,representor=[$x]

  6. Add a port for the Geneve tunnel:

    Copy
    Copied!
                

    ovs-vsctl add-port br-int geneve0 -- set interface geneve0 type=geneve options:key=<VNI> options:remote_ip=<$remote_ip_1> options:local_ip=<$local_ip_1>

OVS-DPDK supports parallel insertion and deletion of offloads (flow and CT). While multiple threads are supported (only one is used by default).

To configure multiple threads:

Copy
Copied!
            

ovs-vsctl set Open_vSwitch . other_config:n-offload-threads=3 systemctl restart openvswitch

Note

Refer to the OVS user manual for more information.

sFlow

sFlow allows monitoring traffic sent between two VMs on the same host using an sFlow collector.

To sample all traffic over the OVS bridge, run the following:

Copy
Copied!
            

# ovs-vsctl -- --id=@sflow create sflow agent=\"$SFLOW_AGENT\" \ target=\"$SFLOW_TARGET:$SFLOW_HEADER\" \ header=$SFLOW_HEADER \ sampling=$SFLOW_SAMPLING polling=10 \ -- set bridge sflow=@sflow

Parameter

Description

SFLOW_AGENT

Indicates that the sFlow agent should send traffic from SFLOW_AGENT's IP address

SFLOW_TARGET

Remote IP address of the sFlow collector

SFLOW_PORT

Remote IP destination port of the sFlow collector

SFLOW_HEADER

Size of packet header to sample (in bytes)

SFLOW_SAMPLING

Sample rate

To clear the sFlow configuration, run:

Copy
Copied!
            

# ovs-vsctl clear bridge br-vxlan mirrors

Note

Currently sFlow for OVS-DPDK is supported without CT.


To enable ct-ct-nat offloads in OVS-DPDK (disabled by default), run:

Copy
Copied!
            

ovs-vsctl set open_vswitch . other_config:ct-action-on-nat-conns=true

If disabled, ct-ct-nat configurations are not fully offloaded, improving connection offloading rate for other cases (ct and ct-nat).

If enabled, ct-ct-nat configurations are fully offloaded but ct and ct-nat offloading would be slower to create.

OpenFlow meters in OVS are implemented according to RFC 2697 (Single Rate Three Color Marker—srTCM).

  • The srTCM meters an IP packet stream and marks its packets either green, yellow, or red. The color is decided on a Committed Information Rate (CIR) and two associated burst sizes, Committed Burst Size (CBS), and Excess Burst Size (EBS).

  • A packet is marked green if it does not exceed the CBS, yellow if it exceeds the CBS but not the EBS, and red otherwise.

  • The volume of green packets should never be smaller than the CIR.

To configure a meter in OVS:

  1. Create a meter over a certain bridge, run:

    Copy
    Copied!
                

    ovs-ofctl -O openflow13 add-meter $bridge meter=$id,$pktps/$kbps,band=type=drop,rate=$rate,[burst,burst_size=$burst_size]

    Parameters:

    Parameter

    Description

    bridge

    Name of the bridge on which the meter should be applied.

    id

    Unique meter ID (32 bits) to be used as an identifier for the meter.

    pktps/kbps

    Indication if the meter should work according to packets or kilobits per second.

    rate

    Rate of pktps/kbps of allowed data transmission.

    burst

    If set, enables burst support for meter bands through the burst_size parameter.

    burst_size

    If burst is specified for the meter entry, configures the maximum burst allowed for the band in kilobits/packets, depending on whether kbps or pktps has been specified. If unspecified, the switch is free to select some reasonable value depending on its configuration. Currently, if burst is not specified, the burst_size parameter is set the same as rate.

  2. Add the meter to a certain OpenFlow rule. For example:

    Copy
    Copied!
                

    ovs-ofctl -O openflow13 add-flow $bridge "table=0,actions=meter:$id,normal"

  3. View the meter statistics:

    Copy
    Copied!
                

    ovs-ofctl -O openflow13 meter-stats $bridge meter=$id

  4. For more information, refer to official OVS documentation.

© Copyright 2024, NVIDIA. Last updated on May 7, 2024.