OVS-DPDK Hardware Offloads
To configure OVS-DPDK HW offloads:
Unbind the VFs:
echo
0000
:04
:00.2
> /sys/bus/pci/drivers/mlx5_core/unbind echo0000
:04
:00.3
> /sys/bus/pci/drivers/mlx5_core/unbindNoteVMs with attached VFs must be powered off to be able to unbind the VFs.
Change the e-switch mode from legacy to switchdev on the PF device (make sure all VFs are unbound). This also creates the VF representor netdevices in the host OS.
echo switchdev > /sys/
class
/net/enp4s0f0/compat/devlink/modeTo revert to SR-IOV legacy mode:
echo legacy > /sys/
class
/net/enp4s0f0/compat/devlink/modeNoteThis command removes the VF representor netdevices.
Bind the VFs:
echo
0000
:04
:00.2
> /sys/bus/pci/drivers/mlx5_core/bind echo0000
:04
:00.3
> /sys/bus/pci/drivers/mlx5_core/bindRun the OVS service:
systemctl start openvswitch
Enable hardware offload (disabled by default):
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=
true
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
Configure the DPDK whitelist:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra=
"-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=1"
Where representor=[0-N].
Restart the OVS service:
systemctl restart openvswitch
InfoThis step is required for the hardware offload changes to take effect.
Create OVS-DPDK bridge:
ovs-vsctl --no-wait add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
Add PF to OVS:
ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=
0000
:88
:00.0
Add representor to OVS:
ovs-vsctl add-port br0-ovs representor -- set Interface representor type=dpdk options:dpdk-devargs=
0000
:88
:00.0
,representor=[0
]Where representor=[0-N].
vSwitch in userspace requires an additional bridge. The purpose of this bridge is to allow use of the kernel network stack for routing and ARP resolution.
The datapath must look up the routing table and ARP table to prepare the tunnel header and transmit data to the output port.
Configuring VXLAN Encap/Decap Offloads
The configuration is done with:
PF on 0000:03:00.0 PCIe and MAC 98:03:9b:cc:21:e8
Local IP 56.56.67.1 – br-phy interface is configured to this IP
Remote IP 56.56.68.1
To configure OVS-DPDK VXLAN:
Create a br-phy bridge:
ovs-vsctl add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone other_config:hwaddr=
98
:03
:9b:cc:21
:e8Attach PF interface to br-phy bridge:
ovs-vsctl add-port br-phy p0 -- set Interface p0 type=dpdk options:dpdk-devargs=
0000
:03
:00.0
Configure IP to the bridge:
ip addr add
56.56
.67.1
/24
dev br-phyCreate a br-ovs bridge:
ovs-vsctl add-br br-ovs -- set Bridge br-ovs datapath_type=netdev -- br-set-external-id br-ovs bridge-id br-ovs -- set bridge br-ovs fail-mode=standalone
Attach representor to br-ovs:
ovs-vsctl add-port br-ovs pf0vf0 -- set Interface pf0vf0 type=dpdk options:dpdk-devargs=
0000
:03
:00.0
,representor=[0
]Add a port for the VXLAN tunnel:
ovs-vsctl add-port ovs-sriov vxlan0 -- set
interface
vxlan0 type=vxlan options:local_ip=56.56
.67.1
options:remote_ip=56.56
.68.1
options:key=45
options:dst_port=4789
CT enables stateful packet processing by keeping a record of currently open connections. OVS flows using CT can be accelerated using advanced NICs by offloading established connections.
To view offloaded connections, run:
ovs-appctl dpctl/offload-stats-show
To configure OVS-DPDK SR-IOV VF LAG:
Enable SR-IOV in the NIC firmware:
// It is recommended to query the parameters first to determine if change is needed, to save unnecessary reboot
mst start mlxconfig -d <mst device> -y set PF_NUM_OF_VF_VALID=0
SRIOV_EN=1
NUM_OF_VFS=8
If configuration changes were made, unless the NIC is BlueField DPU Mode, perform a warm reboot of the Server OS. Otherwise, please perform BlueField System-Level Reset.
Allocate the desired number of VFs per port:
echo $n > /sys/
class
/net/<net name>/device/sriov_numvfsUnbind all VFs:
echo <VF PCI> >/sys/bus/pci/drivers/mlx5_core/unbind
Change both devices' mode to switchdev:
devlink dev eswitch set pci/<PCI> mode switchdev
Create Linux bonding using kernel modules:
modprobe bonding mode=<desired mode>
InfoOther bonding parameters can be added here. The supported bond modes are: Active-backup, XOR and LACP.
Bring all PFs and VFs down:
ip link set <PF/VF> down
Attach both PFs to the bond:
ip link set <PF> master bond0
To use VF-LAG with OVS-DPDK, add the bond master (PF) to the bridge:
ovs-vsctl add-port br-phy p0 -- set Interface p0 type=dpdk options:dpdk-devargs=
0000
:03
:00.0
options:dpdk-lsc-interrupt=true
Add representor $N of PF0 or PF1 to a bridge:
ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0 PCI>,representor=pf0vf$N
Or:
ovs-vsctl add-port br-phy rep$N -- set Interface rep$N type=dpdk options:dpdk-devargs=<PF0 PCI>,representor=pf1vf$N
Hardware vDPA is enabled by default. If your hardware does not support vDPA, the driver will fall back to Software vDPA.
To check which vDPA mode is activated on your driver, run: ovs-ofctl -O OpenFlow14 dump-ports br0-ovs and look for hw-mode flag.
This feature has not been accepted to the OVS-DPDK upstream yet, making its API subject to change.
In user space, there are two main approaches for communicating with a guest (VM), either through SR-IOV or virtio.
PHY ports (SR-IOV) allow working with port representor, which is attached to the OVS and a matching VF is given with pass-through to the guest. HW rules can process packets from up-link and direct them to the VF without going through SW (OVS). Therefore, using SR-IOV achieves the best performance.
However, SR-IOV architecture requires the guest to use a driver specific to the underlying HW. Specific HW driver has two main drawbacks:
Breaks virtualization in some sense (guest is aware of the HW). It can also limit the type of images supported.
Gives less natural support for live migration.
Using a virtio port solves both problems, however, it reduces performance and causes loss of some functionalities, such as, for some HW offloads, working directly with virtio. The netdev type dpdkvdpa solves this conflict as it is similar to the regular DPDK netdev yet introduces several additional functionalities.
dpdkvdpa translates between the PHY port to the virtio port. It takes packets from the Rx queue and sends them to the suitable Tx queue, and allows transfer of packets from the virtio guest (VM) to a VF and vice-versa, benefitting from both SR-IOV and virtio.
To add a vDPA port:
ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
options:vdpa-socket-path=<sock path> \
options:vdpa-accelerator-devargs=<vf pci id> \
options:dpdk-devargs=<pf pci id>,representor=[id] \
options: vdpa-max-queues =<num queues> \
options: vdpa-sw=<true
/false
>
vdpa-max-queues is an optional field. When the user wants to configure 32 vDPA ports, the maximum queues number is limited to 8.
vDPA Configuration in OVS-DPDK Mode
Prior to configuring vDPA in OVS-DPDK mode, perform the following:
Generate the VF:
echo
0
> /sys/class
/net/enp175s0f0/device/sriov_numvfs echo4
> /sys/class
/net/enp175s0f0/device/sriov_numvfsUnbind each VF:
echo <pci> > /sys/bus/pci/drivers/mlx5_core/unbind
Switch to switchdev mode:
echo switchdev >> /sys/
class
/net/enp175s0f0/compat/devlink/modeBind each VF:
echo <pci> > /sys/bus/pci/drivers/mlx5_core/bind
Initialize OVS:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=
true
ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
To configure vDPA in OVS-DPDK mode:
OVS configuration:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra=
"-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=1"
/usr/share/openvswitch/scripts/ovs-ctl restartCreate OVS-DPDK bridge:
ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=
0000
:01
:00.0
Create vDPA port as part of the OVS-DPDK bridge:
ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=
0000
:01
:00.2
options:dpdk-devargs=0000
:01
:00.0
,representor=[0
] options: vdpa-max-queues=8
To configure vDPA in OVS-DPDK mode on BlueField DPUs, set the bridge with the software or hardware vDPA port:
To create the OVS-DPDK bridge on the Arm side:
ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=
0000
:af:00.0
ovs-vsctl add-port br0-ovs rep-- set Interface rep type=dpdk options:dpdk-devargs=0000
:af:00.0
,representor=[0
]To create the OVS-DPDK bridge on the host side:
ovs-vsctl add-br br1-ovs -- set bridge br1-ovs datapath_type=netdev protocols=OpenFlow14 ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=
0000
:af:00.2
NoteTo configure SW vDPA, add options:vdpa-sw=true to the command.
Software vDPA Configuration in OVS-Kernel Mode
Software vDPA can also be used in configurations where hardware offload is done through TC and not DPDK.
OVS configuration:
ovs-vsctl set Open_vSwitch . other_config:dpdk-extra=
"-a 0000:01:00.0,representor=[0],dv_flow_en=1,dv_esw_en=0,idv_xmeta_en=0,isolated_mode=1"
/usr/share/openvswitch/scripts/ovs-ctl restartCreate OVS-DPDK bridge:
ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
Create vDPA port as part of the OVS-DPDK bridge:
ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=
0000
:01
:00.2
options:dpdk-devargs=0000
:01
:00.0
,representor=[0
] options: vdpa-max-queues=8
Create Kernel bridge:
ovs-vsctl add-br br-kernel
Add representors to Kernel bridge:
ovs-vsctl add-port br-kernel enp1s0f0_0 ovs-vsctl add-port br-kernel enp1s0f0
To configure MTU/jumbo frames:
Verify that the Kernel version on the VM is 4.14 or above:
cat /etc/redhat-release
Set the MTU on both physical interfaces in the host:
ifconfig ens4f0 mtu
9216
Send a large size packet and verify that it is sent and received correctly:
tcpdump -i ens4f0 -nev icmp & ping
11.100
.126.1
-s9188
-Mdo
-c1
Enable host_mtu in XML and add the following values:
host_mtu=
9216
,csum=on,guest_csum=on,host_tso4=on,host_tso6=onExample:
<qemu:commandline> <qemu:arg value=
'-chardev'
/> <qemu:arg value='socket,id=charnet1,path=/tmp/sock0,server'
/> <qemu:arg value='-netdev'
/> <qemu:arg value='vhost-user,chardev=charnet1,queues=16,id=hostnet1'
/> <qemu:arg value='-device'
/> <qemu:arg value='virtio-net-pci,mq=on,vectors=34,netdev=hostnet1,id=net1,mac=00:21:21:24:02:01,bus=pci.0,addr=0xC,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,host_mtu=9216,csum=on,guest_csum=on,host_tso4=on,host_tso6=on'
/> </qemu:commandline>Add the mtu_request=9216 option to the OVS ports inside the container and restart the OVS:
ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options:dpdk-devargs=
0000
:c4:00.0
mtu_request=9216
Or:
ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/tmp/sock0 options:vdpa-accelerator-devargs=
0000
:c4:00.2
options:dpdk-devargs=0000
:c4:00.0
,representor=[0
] mtu_request=9216
/usr/share/openvswitch/scripts/ovs-ctl restartStart the VM and configure the MTU on the VM:
ifconfig eth0
11.100
.124.2
/16
up ifconfig eth0 mtu9216
ping11.100
.126.1
-s9188
-Mdo
-c1
This feature is supported at beta level.
OVS offload rules are based on a multi-table architecture. E2E cache enables merging the multi-table flow matches and actions into one joint flow.
This improves CT performance by using a single-table when an exact match is detected.
To set the E2E cache size (default is 4k):
ovs-vsctl set open_vswitch . other_config:e2e-size=<size>
systemctl restart openvswitch
To enable E2E cache (disabled by default):
ovs-vsctl set open_vswitch . other_config:e2e-enable=true
systemctl restart openvswitch
To run E2E cache statistics:
ovs-appctl dpctl/dump-e2e-stats
To run E2E cache flows:
ovs-appctl dpctl/dump-e2e-flows
Geneve tunneling offload support includes matching on extension header.
To configure OVS-DPDK Geneve encap/decap:
Create a br-phy bridge:
ovs-vsctl --may-exist add-br br-phy -- set Bridge br-phy datapath_type=netdev -- br-set-external-id br-phy bridge-id br-phy -- set bridge br-phy fail-mode=standalone
Attach PF interface to br-phy bridge:
ovs-vsctl add-port br-phy pf -- set Interface pf type=dpdk options:dpdk-devargs=<PF PCI>
Configure IP to the bridge:
ifconfig br-phy <$local_ip_1> up
Create a br-int bridge:
ovs-vsctl --may-exist add-br br-
int
-- set Bridge br-int
datapath_type=netdev -- br-set-external-id br-int
bridge-id br-int
-- set bridge br-int
fail-mode=standaloneAttach representor to br-int:
ovs-vsctl add-port br-
int
rep$x -- set Interface rep$x type=dpdk options:dpdk-devargs=<PF PCI>,representor=[$x]Add a port for the Geneve tunnel:
ovs-vsctl add-port br-
int
geneve0 -- setinterface
geneve0 type=geneve options:key=<VNI> options:remote_ip=<$remote_ip_1> options:local_ip=<$local_ip_1>
OVS-DPDK supports parallel insertion and deletion of offloads (flow and CT). While multiple threads are supported (only one is used by default).
To configure multiple threads:
ovs-vsctl set Open_vSwitch . other_config:n-offload-threads=3
systemctl restart openvswitch
Refer to the OVS user manual for more information.
sFlow
sFlow allows monitoring traffic sent between two VMs on the same host using an sFlow collector.
To sample all traffic over the OVS bridge, run the following:
# ovs-vsctl -- --id=@sflow
create sflow agent=\"$SFLOW_AGENT\" \
target=\"$SFLOW_TARGET:$SFLOW_HEADER\" \
header=$SFLOW_HEADER \
sampling=$SFLOW_SAMPLING polling=10
\
-- set bridge sflow=@sflow
Parameter |
Description |
SFLOW_AGENT |
Indicates that the sFlow agent should send traffic from SFLOW_AGENT's IP address |
SFLOW_TARGET |
Remote IP address of the sFlow collector |
SFLOW_PORT |
Remote IP destination port of the sFlow collector |
SFLOW_HEADER |
Size of packet header to sample (in bytes) |
SFLOW_SAMPLING |
Sample rate |
To clear the sFlow configuration, run:
# ovs-vsctl clear bridge br-vxlan mirrors
Currently sFlow for OVS-DPDK is supported without CT.
To enable ct-ct-nat offloads in OVS-DPDK (disabled by default), run:
ovs-vsctl set open_vswitch . other_config:ct-action-on-nat-conns=true
If disabled, ct-ct-nat configurations are not fully offloaded, improving connection offloading rate for other cases (ct and ct-nat).
If enabled, ct-ct-nat configurations are fully offloaded but ct and ct-nat offloading would be slower to create.
OpenFlow meters in OVS are implemented according to RFC 2697 (Single Rate Three Color Marker—srTCM).
The srTCM meters an IP packet stream and marks its packets either green, yellow, or red. The color is decided on a Committed Information Rate (CIR) and two associated burst sizes, Committed Burst Size (CBS), and Excess Burst Size (EBS).
A packet is marked green if it does not exceed the CBS, yellow if it exceeds the CBS but not the EBS, and red otherwise.
The volume of green packets should never be smaller than the CIR.
To configure a meter in OVS:
Create a meter over a certain bridge, run:
ovs-ofctl -O openflow13 add-meter $bridge meter=$id,$pktps/$kbps,band=type=drop,rate=$rate,[burst,burst_size=$burst_size]
Parameters:
Parameter
Description
bridge
Name of the bridge on which the meter should be applied.
id
Unique meter ID (32 bits) to be used as an identifier for the meter.
pktps/kbps
Indication if the meter should work according to packets or kilobits per second.
rate
Rate of pktps/kbps of allowed data transmission.
burst
If set, enables burst support for meter bands through the burst_size parameter.
burst_size
If burst is specified for the meter entry, configures the maximum burst allowed for the band in kilobits/packets, depending on whether kbps or pktps has been specified. If unspecified, the switch is free to select some reasonable value depending on its configuration. Currently, if burst is not specified, the burst_size parameter is set the same as rate.
Add the meter to a certain OpenFlow rule. For example:
ovs-ofctl -O openflow13 add-flow $bridge
"table=0,actions=meter:$id,normal"
View the meter statistics:
ovs-ofctl -O openflow13 meter-stats $bridge meter=$id
For more information, refer to official OVS documentation.