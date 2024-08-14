Virtio VF PCIe devices can be attached to the guest VM using the vhost acceleration software stack. This enables performing live migration of guest VMs.

This section provides the steps to enable VM live migration using virtio VF PCIe devices along with vhost acceleration software.

Minimum hypervisor kernel version – Linux kernel 5.7 (for VFIO SR-IOV support)

To use high-availability (the additional vfe-vhostd-ha service which can persist datapath when vfe-vhostd crashes), this kernel patch must be applied.

Vhost acceleration software stack is built using open-source BSD licensed DPDK.

To install vhost acceleration software:

Clone the software source code: Copy Copied! [host]# git clone https://github.com/Mellanox/dpdk-vhost-vfe Info The latest release tag is vfe-24.07-rc2 . Build software: Copy Copied! [host]# apt-get install libev-dev -y [host]# apt-get install libev-libevent-dev -y [host]# apt-get install uuid-dev -y [host]# apt-get install libnuma-dev -y [host]# meson build --debug -Denable_drivers=vdpa/virtio,common/virtio,common/virtio_mi,common/virtio_ha [host]# ninja -C build install

To install QEMU: Info Upstream QEMU later than 8.1 can be used or the following NVIDIA QEMU.

Clone NVIDIA QEMU sources. Copy Copied! [host]# git clone https://github.com/Mellanox/qemu -b stable-8.1-presetup Info Latest release tag is vfe-0.6 . Build NVIDIA QEMU. Copy Copied! [host]# mkdir bin [host]# cd bin [host]# ../configure --target-list=x86_64-softmmu --enable-kvm [host]# make -j24

Configure 1G huge pages : Copy Copied! [host]# mkdir /dev/hugepages1G [host]# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G [host]# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages [host]# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages Enable qemu:commandline in VM XML by adding the xmlns:qemu option: Copy Copied! < domain type = 'kvm' xmlns:qemu = 'http://libvirt.org/schemas/domain/qemu/1.0' > Assign a memory amount and use 1GB page size for huge pages in VM XML: Copy Copied! < memory unit = 'GiB' >4</ memory > < currentMemory unit = 'GiB' >4</ currentMemory > < memoryBacking > < hugepages > < page size = '1' unit = 'GiB' /> </ hugepages > </ memoryBacking > Set the memory access for the CPUs to be shared: Copy Copied! < cpu mode = 'custom' match = 'exact' check = 'partial' > < model fallback = 'allow' >Skylake-Server-IBRS</ model > < numa > < cell id = '0' cpus = '0-1' memory = '4' unit = 'GiB' memAccess = 'shared' /> </ numa > </ cpu > Add a virtio-net interface in VM XML: Copy Copied! < qemu :commandline> < qemu :arg value = '-chardev' /> < qemu :arg value = 'socket,id=char0,path=/tmp/vhost-net0,server=on' /> < qemu :arg value = '-netdev' /> < qemu :arg value = 'type=vhost-user,id=vhost1,chardev=char0,queues=4' /> < qemu :arg value = '-device' /> < qemu :arg value = 'virtio-net-pci,netdev=vhost1,mac=00:00:00:00:33:00,vectors=10,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,mq=on,disable-legacy=on,disable-modern=off' /> </ qemu :commandline>

Bind the virtio PF devices to the vfio-pci driver: Copy Copied! [host]# modprobe vfio vfio_pci [host]# echo 1 > /sys/module/vfio_pci/parameters/enable_sriov [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo 0x1af4 0x1042 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo <pf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <pf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind [host]# lspci -vvv -s <pf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci [host]# lspci -vvv -s <vf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci Info Example of <pf_bdf> or <vf_bdf> format: 0000:af:00.3 Enable SR-IOV and create a VF(s): Copy Copied! [host]# echo 1 > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs [host]# lspci | grep Virtio 0000:af:00.1 Ethernet controller: Red Hat, Inc. Virtio network device 0000:af:00.3 Ethernet controller: Red Hat, Inc. Virtio network device Add a VF representor to the OVS bridge on the BlueField: Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep sf_rep_net_device "sf_rep_net_device": "en3f0pf0sf3000", [dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf3000 Run the vhost acceleration software service: start the vfe-vhostd service: Copy Copied! [host]# systemctl start vfe-vhostd Info A log of the service can be viewed by running the following: Copy Copied! [host]# journalctl -u vfe-vhostd Provision the virtio-net PF and VF: Copy Copied! [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -a <pf_bdf> # Wait on virtio-net-controller finishing handle PF FLR # On BlueField, change VF MAC address or other device options [dpu]# virtnet modify -p 0 -v 0 device -m 00:00:00:00:33:00 # Add VF into vfe-dpdk [host]# /usr/local/bin/vfe-vhost-cli vf -a <vf_bdf> -v /tmp/vhost-net0 Note If the SR-IOV is disabled and reenabled, the user must re-provision the VFs. 00:00:00:00:33:00 is a virtual MAC address used in VM XML.

Running the vfe-vhostd-ha service allows the datapath to persist should vfe-vhostd crash:

Prepare two identical hosts and perform the provisioning of the virtio device to DPDK on both. Boot the VM on one server: Copy Copied! [host]# virsh migrate --verbose --live --persistent <vm_name> qemu+ssh://<dest_node_ip_addr>/system --unsafe

When finished with the virtio devices, use following commands to remove them from DPDK:

