NVIDIA BlueField Virtio-net v24.07
NVIDIA BlueField Virtio-net v24.07

Live Migration

Virtio VF PCIe devices can be attached to the guest VM using the vhost acceleration software stack. This enables performing live migration of guest VMs.

virtio-vf-pcie-devices-for-vhost-acceleration-version-1-modificationdate-1723682987227-api-v2.png

This section provides the steps to enable VM live migration using virtio VF PCIe devices along with vhost acceleration software.

vdpa-over-virtio-full-emulation-design-version-1-modificationdate-1723682986533-api-v2.png

Prerequisites

  • Minimum hypervisor kernel version – Linux kernel 5.7 (for VFIO SR-IOV support)

  • To use high-availability (the additional vfe-vhostd-ha service which can persist datapath when vfe-vhostd crashes), this kernel patch must be applied.

Install vHost Acceleration Software Stack

Vhost acceleration software stack is built using open-source BSD licensed DPDK.

  • To install vhost acceleration software:

    1. Clone the software source code:

      Copy
      Copied!
                  

      [host]# git clone https://github.com/Mellanox/dpdk-vhost-vfe

      Info

      The latest release tag is vfe-24.07-rc2.

    2. Build software:

      Copy
      Copied!
                  

      [host]# apt-get install libev-dev -y [host]# apt-get install libev-libevent-dev -y [host]# apt-get install uuid-dev -y [host]# apt-get install libnuma-dev -y [host]# meson build --debug -Denable_drivers=vdpa/virtio,common/virtio,common/virtio_mi,common/virtio_ha   [host]# ninja -C build install

  • To install QEMU:

    Info

    Upstream QEMU later than 8.1 can be used or the following NVIDIA QEMU.

    1. Clone NVIDIA QEMU sources.

      Copy
      Copied!
                  

      [host]# git clone https://github.com/Mellanox/qemu -b stable-8.1-presetup

      Info

      Latest release tag is vfe-0.6.

    2. Build NVIDIA QEMU.

      Copy
      Copied!
                  

      [host]# mkdir bin [host]# cd bin [host]# ../configure --target-list=x86_64-softmmu --enable-kvm [host]# make -j24

Configure vHost on Hypervisor

    1. Configure 1G huge pages :

      Copy
      Copied!
                  

      [host]# mkdir /dev/hugepages1G [host]# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G [host]# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages [host]# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

    2. Enable qemu:commandline in VM XML by adding the xmlns:qemu option:

      Copy
      Copied!
                  

      <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

    3. Assign a memory amount and use 1GB page size for huge pages in VM XML:

      Copy
      Copied!
                  

      <memory unit='GiB'>4</memory> <currentMemory unit='GiB'>4</currentMemory> <memoryBacking> <hugepages> <page size='1' unit='GiB'/> </hugepages> </memoryBacking>

    4. Set the memory access for the CPUs to be shared:

      Copy
      Copied!
                  

      <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>Skylake-Server-IBRS</model> <numa> <cell id='0' cpus='0-1' memory='4' unit='GiB' memAccess='shared'/> </numa> </cpu>

    5. Add a virtio-net interface in VM XML:

      Copy
      Copied!
                  

      <qemu:commandline> <qemu:arg value='-chardev'/> <qemu:arg value='socket,id=char0,path=/tmp/vhost-net0,server=on'/> <qemu:arg value='-netdev'/> <qemu:arg value='type=vhost-user,id=vhost1,chardev=char0,queues=4'/> <qemu:arg value='-device'/> <qemu:arg value='virtio-net-pci,netdev=vhost1,mac=00:00:00:00:33:00,vectors=10,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,mq=on,disable-legacy=on,disable-modern=off'/> </qemu:commandline>

Run vHost Acceleration Service

  1. Bind the virtio PF devices to the vfio-pci driver:

    Copy
    Copied!
                

    [host]# modprobe vfio vfio_pci [host]# echo 1 > /sys/module/vfio_pci/parameters/enable_sriov   [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo 0x1af4 0x1042 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo <pf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <pf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind   [host]# lspci -vvv -s <pf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci [host]# lspci -vvv -s <vf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci

    Info

    Example of <pf_bdf> or <vf_bdf> format: 0000:af:00.3

  2. Enable SR-IOV and create a VF(s):

    Copy
    Copied!
                

    [host]# echo 1 > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs [host]# lspci | grep Virtio 0000:af:00.1 Ethernet controller: Red Hat, Inc. Virtio network device 0000:af:00.3 Ethernet controller: Red Hat, Inc. Virtio network device

  3. Add a VF representor to the OVS bridge on the BlueField:

    Copy
    Copied!
                

    [dpu]# virtnet query -p 0 -v 0 | grep sf_rep_net_device "sf_rep_net_device": "en3f0pf0sf3000", [dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf3000

  4. Run the vhost acceleration software service:

    start the vfe-vhostd service:

    Copy
    Copied!
                

    [host]# systemctl start vfe-vhostd

    Info

    A log of the service can be viewed by running the following:

    Copy
    Copied!
                

    [host]# journalctl -u vfe-vhostd

  5. Provision the virtio-net PF and VF:

    Copy
    Copied!
                

    [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -a <pf_bdf> # Wait on virtio-net-controller finishing handle PF FLR   # On BlueField, change VF MAC address or other device options [dpu]# virtnet modify -p 0 -v 0 device -m 00:00:00:00:33:00   # Add VF into vfe-dpdk [host]# /usr/local/bin/vfe-vhost-cli vf -a <vf_bdf> -v /tmp/vhost-net0

    Note

    If the SR-IOV is disabled and reenabled, the user must re-provision the VFs. 00:00:00:00:33:00 is a virtual MAC address used in VM XML.

Start the VM

Copy
Copied!
            

[host]# virsh start <vm_name>


HA Service

Running the vfe-vhostd-ha service allows the datapath to persist should vfe-vhostd crash:

Copy
Copied!
            

[host]# systemctl start vfe-vhostd-ha


Simple Live Migration

  1. Prepare two identical hosts and perform the provisioning of the virtio device to DPDK on both.

  2. Boot the VM on one server:

    Copy
    Copied!
                

    [host]# virsh migrate --verbose --live --persistent <vm_name> qemu+ssh://<dest_node_ip_addr>/system --unsafe

Remove Device

When finished with the virtio devices, use following commands to remove them from DPDK:

Copy
Copied!
            

[host]# /usr/local/bin/vfe-vhost-cli vf -r <vf_bdf> [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -r <pf_bdf>


© Copyright 2024, NVIDIA. Last updated on Aug 14, 2024.