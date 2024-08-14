NVIDIA BlueField Virtio-net v24.07
Live Migration

Live Migration Using vHost Acceleration Software Stack

Virtio VF PCIe devices can be attached to the guest VM using the vhost acceleration software stack. This enables performing live migration of guest VMs.

virtio-vf-pcie-devices-for-vhost-acceleration-version-1-modificationdate-1723682987227-api-v2.png

This section provides the steps to enable VM live migration using virtio VF PCIe devices along with vhost acceleration software.

vdpa-over-virtio-full-emulation-design-version-1-modificationdate-1723682986533-api-v2.png

Prerequisites

  • Minimum hypervisor kernel version – Linux kernel 5.7 (for VFIO SR-IOV support)

  • To use high-availability (the additional vfe-vhostd-ha service which can persist datapath when vfe-vhostd crashes), this kernel patch must be applied.

Install vHost Acceleration Software Stack

Vhost acceleration software stack is built using open-source BSD licensed DPDK.

  • To install vhost acceleration software:

    1. Clone the software source code:

      Copy
      Copied!
                  
      
            
      [host]# git clone https://github.com/Mellanox/dpdk-vhost-vfe

      Info

      The latest release tag is vfe-24.07-rc2.

    2. Build software:

      Copy
      Copied!
                  
      
            
      [host]# apt-get install libev-dev -y
[host]# apt-get install libev-libevent-dev  -y
[host]# apt-get install uuid-dev  -y
[host]# apt-get install libnuma-dev -y
[host]# meson build --debug -Denable_drivers=vdpa/virtio,common/virtio,common/virtio_mi,common/virtio_ha  
[host]# ninja -C build install

  • To install QEMU:

    Info

    Upstream QEMU later than 8.1 can be used or the following NVIDIA QEMU.

    1. Clone NVIDIA QEMU sources.

      Copy
      Copied!
                  
      
            
      [host]# git clone https://github.com/Mellanox/qemu -b stable-8.1-presetup

      Info

      Latest release tag is vfe-0.6.

    2. Build NVIDIA QEMU.

      Copy
      Copied!
                  
      
            
      [host]# mkdir bin 
[host]# cd bin 
[host]# ../configure --target-list=x86_64-softmmu --enable-kvm 
[host]# make -j24

Configure vHost on Hypervisor

    1. Configure 1G huge pages :

      Copy
      Copied!
                  
      
            
      [host]# mkdir /dev/hugepages1G
[host]# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G
[host]# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
[host]# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

    2. Enable qemu:commandline in VM XML by adding the xmlns:qemu option:

      Copy
      Copied!
                  
      
            
      <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

    3. Assign a memory amount and use 1GB page size for huge pages in VM XML:

      Copy
      Copied!
                  
      
            
       <memory unit='GiB'>4</memory>
 <currentMemory unit='GiB'>4</currentMemory>
 <memoryBacking>
    <hugepages>
      <page size='1' unit='GiB'/>
    </hugepages>
 </memoryBacking>

    4. Set the memory access for the CPUs to be shared:

      Copy
      Copied!
                  
      
            
      <cpu mode='custom' match='exact' check='partial'>
  <model fallback='allow'>Skylake-Server-IBRS</model>
  <numa>
    <cell id='0' cpus='0-1' memory='4' unit='GiB' memAccess='shared'/>
  </numa>
</cpu>

    5. Add a virtio-net interface in VM XML:

      Copy
      Copied!
                  
      
            
      <qemu:commandline>
  <qemu:arg value='-chardev'/>
  <qemu:arg value='socket,id=char0,path=/tmp/vhost-net0,server=on'/>
  <qemu:arg value='-netdev'/>
  <qemu:arg value='type=vhost-user,id=vhost1,chardev=char0,queues=4'/>
  <qemu:arg value='-device'/>
  <qemu:arg value='virtio-net-pci,netdev=vhost1,mac=00:00:00:00:33:00,vectors=10,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,mq=on,disable-legacy=on,disable-modern=off'/>
</qemu:commandline>

Run vHost Acceleration Service

  1. Bind the virtio PF devices to the vfio-pci driver:

    Copy
    Copied!
                
    
            
    [host]# modprobe vfio vfio_pci 
[host]# echo 1 > /sys/module/vfio_pci/parameters/enable_sriov   
[host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/vfio-pci/new_id 
[host]# echo 0x1af4 0x1042 > /sys/bus/pci/drivers/vfio-pci/new_id
[host]# echo <pf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
[host]# echo <vf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
[host]# echo <pf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind 
[host]# echo <vf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind   
[host]# lspci -vvv -s <pf_bdf> | grep "Kernel driver"
Kernel driver in use: vfio-pci 
[host]# lspci -vvv -s <vf_bdf> | grep "Kernel driver"
Kernel driver in use: vfio-pci

    Info

    Example of <pf_bdf> or <vf_bdf> format: 0000:af:00.3

  2. Enable SR-IOV and create a VF(s):

    Copy
    Copied!
                
    
            
    [host]# echo 1 > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs 
[host]# lspci | grep Virtio
0000:af:00.1 Ethernet controller: Red Hat, Inc. Virtio network device 
0000:af:00.3 Ethernet controller: Red Hat, Inc. Virtio network device

  3. Add a VF representor to the OVS bridge on the BlueField:

    Copy
    Copied!
                
    
            
    [dpu]# virtnet query -p 0 -v 0 | grep sf_rep_net_device
"sf_rep_net_device": "en3f0pf0sf3000", 
[dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf3000

  4. Run the vhost acceleration software service:

    start the vfe-vhostd service:

    Copy
    Copied!
                
    
            
    [host]# systemctl start vfe-vhostd

    Info

    A log of the service can be viewed by running the following:

    Copy
    Copied!
                
    
            
    [host]# journalctl -u vfe-vhostd

  5. Provision the virtio-net PF and VF:

    Copy
    Copied!
                
    
            
    [host]#  /usr/local/bin/vfe-vhost-cli mgmtpf -a <pf_bdf>
# Wait on virtio-net-controller finishing handle PF FLR
 
# On BlueField, change VF MAC address or other device options
[dpu]# virtnet modify -p 0 -v 0 device -m 00:00:00:00:33:00
 
# Add VF into vfe-dpdk
[host]# /usr/local/bin/vfe-vhost-cli vf -a <vf_bdf> -v /tmp/vhost-net0

    Note

    If the SR-IOV is disabled and reenabled, the user must re-provision the VFs. 00:00:00:00:33:00 is a virtual MAC address used in VM XML.

Start the VM

Copy
Copied!
            

            
[host]# virsh start <vm_name>


HA Service

Running the vfe-vhostd-ha service allows the datapath to persist should vfe-vhostd crash:

Copy
Copied!
            

            
[host]# systemctl start vfe-vhostd-ha


Simple Live Migration

  1. Prepare two identical hosts and perform the provisioning of the virtio device to DPDK on both.

  2. Boot the VM on one server:

    Copy
    Copied!
                
    
            
    [host]# virsh migrate --verbose --live --persistent <vm_name> qemu+ssh://<dest_node_ip_addr>/system --unsafe

Remove Device

When finished with the virtio devices, use following commands to remove them from DPDK:

Copy
Copied!
            

            
[host]# /usr/local/bin/vfe-vhost-cli vf -r <vf_bdf>
[host]# /usr/local/bin/vfe-vhost-cli mgmtpf -r <pf_bdf>


