NVIDIA MLNX_OFED Documentation v24.04-0.7.0.0
NVIDIA MLNX_OFED Documentation v24.04-0.7.0.0

SR-IOV Live Migration

Note

This feature is supported in Ethernet mode only.

Live migration refers to the process of moving a guest virtual machine (VM) running on one physical host to another host without disrupting normal operations or causing other adverse effects for the end user.

Using the Migration process is useful for:

  • load balancing

  • hardware independence

  • energy saving

  • geographic migration

  • fault tolerance

Migration works by sending the state of the guest virtual machine's memory and any virtualized devices to a destination host physical machine. Migrations can be performed live or not, in the live case, the migration will not disrupt the user operations and it will be transparent to it as explained in the sections below.

When using the non-live migration process, the Hypervisor suspends the guest virtual machine, then moves an image of the guest virtual machine's memory to the destination host physical machine. The guest virtual machine is then resumed on the destination host physical machine, and the memory the guest virtual machine used on the source host physical machine is freed. The time it takes to complete such a migration depends on the network bandwidth and latency. If the network is experiencing heavy use or low bandwidth, the migration will take longer then desired.

When using the Live Migration process, the guest virtual machine continues to run on the source host physical machine while its memory pages are transferred to the destination host physical machine. During migration, the Hypervisor monitors the source for any changes in the pages it has already transferred and begins to transfer these changes when all of the initial pages have been transferred.

It also estimates transfer speed during migration, so when the remaining amount of data to transfer will take a certain configurable period of time, it will suspend the original guest virtual machine, transfer the remaining data, and resume the same guest virtual machine on the destination host physical machine.

MLX5 VF Live Migration

The purpose of this section is to demonstrate how to perform basic live migration of a QEMU VM with an MLX5 VF assigned to it. This section does not explains how to create VMs either using libvirt or directly via QEMU.

Requirements

The below are the requirements for working with MLX5 VF Live Migration.

Components

Description

Adapter Cards

  • ConnectX-7 ETH

  • BlueField-3 ETH

Note

The same PSID must be used on both the source and the target hosts (identical cards, same CAPs and features are needed), and have the same firmware version.

Firmware

  • 28.41.1000

  • 32.41.1000

Kernel

Linux v6.7 or newer

User Space Tools

iproute2 version 6.2 or newer

QEMU

QEMU 8.1 or newer

Libvirt

Libvirt 8.6 or newer

Setup

NVCONFIG

SR-IOV should be enabled and be configured to support the required number of VFs as of enabling live migration. This can be achieved by the below command:

Copy
Copied!
            

mlxconfig -d *<PF_BDF>* s SRIOV_EN=1 NUM_OF_VFS=4 VF_MIGRATION_MODE=2

where:

SRIOV_EN

Enable Single-Root I/O Virtualization (SR-IOV)

NUM_OF_VFS

The total number of Virtual Functions (VFs) that can be supported, for each PF.

VF_MIGRATION_MODE

Defines support for VF migration.

  • 0x0: DEVICE_DEFAULT

  • 0x1: MIGRATION_DISABLED

  • 0x2: MIGRATION_ENABLED


Kernel Configuration

Needs to be compiled with driver MLX5_VFIO_PCI enabled. (i.e. CONFIG_MLX5_VFIO_PCI).

To load the driver, run:

Copy
Copied!
            

modprobe mlx5_vfio_pci


QEMU

Needs to be compiled with VFIO_PCI enabled (this is enabled by default).

As stated earlier, creating the VMs is beyond the scope of this guide and we assume that they are already created. However, the VM configuration should be a migratable configuration, similarly to how it is done without SRIOV VFs.

Note

The below steps should be done before running the VMs.

Over libvirt

  1. Set the PF in the "switchdev" mode.

    Copy
    Copied!
                

    devlink dev eswitch set pci/<PF_BDF> mode switchdev

  2. Create the VFs that will be assigned to the VMs.

    Copy
    Copied!
                

    echo "1" > /sys/bus/pci/devices/<PF_BDF>/sriov_numvfs

  3. Set the VFs as migration capable.

    1. See the name of the VFs, run:

      Copy
      Copied!
                  

      devlink port show

    2. Unbind the VFs from mlx5_core, run:

      Copy
      Copied!
                  

      echo '<VF_BDF>' > /sys/bus/pci/drivers/mlx5_core/unbind

    3. Use devlink to set each VF as migration capable, run:

      Copy
      Copied!
                  

      devlink port function set pci/<PF_BDF>/1 migratable enable

  4. Assign the VFs to the VMs.

    1. Edit the VMs XML file, run:

      Copy
      Copied!
                  

      virsh edit <VM_NAME>

    2. Assign the VFs to the VM by adding the following under the "devices" tag:

      Copy
      Copied!
                  

      <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x08' slot='0x00' function='0x2'/> </source> <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/> </hostdev>

      Note

      The domain, bus, slot and function values above are dummy values, replace them with your VFs values.

  5. Set the destination VM in incoming mode.

    1. Edit the destination VM XML file, run:

      Copy
      Copied!
                  

      virsh edit <VM_NAME>

    2. Set the destination VM in migration incoming mode by adding the following under "domain" tag:

      Copy
      Copied!
                  

      <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> [...] <qemu:commandline> <qemu:arg value='--incoming'/> <qemu:arg value='tcp:<DEST_IP>:<DEST_PORT>'/> </qemu:commandline> </domain>

      Note

      To be able to save the file, the above "xmlns:qemu" attribute of the "domain" tag must be added as well.

  6. Bind the VFs to mlx5_vfio_pci driver.

    1. Detach the VFs from libvirt management, run:

      Copy
      Copied!
                  

      virsh nodedev-detach pci_<VF_BDF>

    2. Unbind the VFs from vfio-pci driver (the VFs are automatically bound to it after running "virsh nodedev-detach"), run:

      Copy
      Copied!
                  

      echo '<VF_BDF>' > /sys/bus/pci/drivers/vfio-pci/unbind

    3. Set driver override, run:

      Copy
      Copied!
                  

      echo 'mlx5_vfio_pci' > /sys/bus/pci/devices/<VF_BDF>/driver_override

    4. Bind the VFs to mlx5_vfio_pci driver, run:

      Copy
      Copied!
                  

      echo '<VF_BDF>' > /sys/bus/pci/drivers/mlx5_vfio_pci/bind

Directly over QEMU

  1. Set the PF in "switchdev" mode.

    Copy
    Copied!
                

    devlink dev eswitch set pci/<PF_BDF> mode switchdev

  2. Create the VFs that will be assigned to the VMs.

    Copy
    Copied!
                

    echo "1" > /sys/bus/pci/devices/<PF_BDF>/sriov_numvfs

  3. Set the VFs as migration capable.

    1. See the name of the VFs, run:

      Copy
      Copied!
                  

      devlink port show

    2. Unbind the VFs from mlx5_core, run:

      Copy
      Copied!
                  

      echo '<VF_BDF>' > /sys/bus/pci/drivers/mlx5_core/unbind

    3. Use devlink to set each VF as migration capable, run:

      Copy
      Copied!
                  

      devlink port function set pci/<PF_BDF>/1 migratable enable

  4. Bind the VFs to mlx5_vfio_pci driver:

    1. Set driver override, run:

      Copy
      Copied!
                  

      echo 'mlx5_vfio_pci' > /sys/bus/pci/devices/<VF_BDF>/driver_override

    2. Bind the VFs to mlx5_vfio_pci driver, run:

      Copy
      Copied!
                  

      echo '<VF_BDF>' > /sys/bus/pci/drivers/mlx5_vfio_pci/bind

Over libvirt

  1. Start the VMs in source and in destination, run:

    Copy
    Copied!
                

    virsh start <VM_NAME>

  2. Enable switchover-ack QEMU migration capability. Run the following commands both in source and destination:

    Copy
    Copied!
                

    virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_capability return-path on"

    Copy
    Copied!
                

    virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_capability switchover-ack on"

  3. [Optional] Configure the migration bandwidth and downtime limit in source side:

    Copy
    Copied!
                

    virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_parameter max-bandwidth <VALUE>" virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_parameter downtime-limit <VALUE>"

  4. Start migration by running the migration command in source side:

    Copy
    Copied!
                

    virsh qemu-monitor-command <VM_NAME> --hmp "migrate -d tcp:<DEST_IP>:<DEST_PORT>"

  5. Check the migration status by running the info command in source side:

    Copy
    Copied!
                

    virsh qemu-monitor-command <VM_NAME> --hmp "info migrate"

    Note

    When the migration status is "completed" it means the migration has finished successfully.

Directly over QEMU

  1. Start the VM in source with the VF assigned to it:

    Copy
    Copied!
                

    qemu-system-x86_64 [...] -device vfio-pci,host=<VF_BDF>,id=mlx5_1

  2. Start the VM in destination with the VF assigned to it and with the "incoming" parameter:

    Copy
    Copied!
                

    qemu-system-x86_64 [...] -device vfio-pci,host=<VF_BDF>,id=mlx5_1 -incoming tcp:<DEST_IP>:<DEST_PORT>

  3. Enable switchover-ack QEMU migration capability. Run the following commands in QEMU monitor, both in source and destination:

    Copy
    Copied!
                

    migrate_set_capability return-path on

    Copy
    Copied!
                

    migrate_set_capability switchover-ack on

  4. [Optional] Configure the migration bandwidth and downtime limit in source side:

    Copy
    Copied!
                

    migrate_set_parameter max-bandwidth <VALUE> migrate_set_parameter downtime-limit <VALUE>

  5. Start migration by running the migration command in QEMU monitor in source side:

    Copy
    Copied!
                

    migrate -d tcp:<DEST_IP>:<DEST_PORT>

  6. Check the migration status by running the info command in QEMU monitor in source side:

    Copy
    Copied!
                

    info migrate

    Note

    When the migration status is "completed" it means the migration has finished successfully.

Enables the usage of a dual port Virtual HCA (vHCA) to share RDMA resources (e.g., MR, CQ, SRQ, PDs) across the two Ethernet (RoCE) NIC network ports and display the NIC as a dual port device.

MultiPort vHCA (MPV) VF is made of 2 "regular" VFs, one VF of each port. Creating a migratable MPV VF requires the same steps as regular VF (see steps in section Over libvirt). The steps should be performed on each of the NIC ports. MPV VFs traffic cannot be configured with OVS. TC rules must be defined to configure the MPV VFs traffic.

Note

In ConnectX-7 adapter cards, migration cannot run in parallel on more than 4 VFs. It is the administrator's responsibility to control that.

Note

Live migration requires same firmware version on both the source and the target hosts.

© Copyright 2024, NVIDIA. Last updated on Jun 30, 2024.