image image image image image

On This Page

Single Root IO Virtualization (SR-IOV)

Single Root IO Virtualization (SR-IOV) is a technology that allows a physical PCIe device to present itself multiple times through the PCIe bus. This technology enables multiple virtual instances of the device with separate resources. Mellanox adapters are capable of exposing in ConnectX-4 onwards adapter cards up to 63/127 virtual instances called Virtual Functions (VFs) depending on the firmware capabilities. These Virtual Functions can then be provisioned separately. Each VF can be seen as an addition device connected to the Physical Function. It shares the same resources with the Physical Function.

SR-IOV is commonly used in conjunction with an SR-IOV enabled hypervisor to provide virtual machines direct hardware access to network resources, hence increasing its performance.

In this chapter we will demonstrate setup and configuration of SR-IOV in a ESXi environment using Mellanox ConnectX® adapter cards family.

System Requirements

To set up an SR-IOV environment, the following is required:

  • nmlx5_core Driver
  • A server/blade with an SR-IOV-capable motherboard BIOS
  • Mellanox ConnectX® Adapter Card family with SR-IOV capability
  • Hypervisor that supports SR-IOV such as: ESXi7.0

Setting Up SR-IOV

For further information, see https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html

Configuring SR-IOV for ConnectX-4 Onward

To configure SR-IOV for ConnectX-4 onward, perform the following steps:

  1. Install the MLNX-NATIVE-ESX-<adapter card> driver for ESXi that supports SR-IOV.
  2. Download the MFT package. Go to:
    www.mellanox.com →  Products → Software → InfiniBand/VPI Drivers → MFT (http://www.mellanox.com/page/management_tools)
  3. Install MFT.

    # esxcli software vib install -v <MST Vib>
    # esxcli software vib install -v <MFT Vib>
  4. Reboot system.

  5. Start the mst driver.

      # /opt/mellanox/bin/mst start
  6. Check if SR-IOV is enabled in the firmware.

    /opt/mellanox/bin/mlxconfig -d /dev/mst/mt4115_pciconf0 q
    
      Device #1:
      ----------
    
      Device type:    ConnectX4
      PCI device:     /dev/mst/mt4115_pciconf0
      Configurations:          Current
         SRIOV_EN              1
         NUM_OF_VFS            8
         FPP_EN                1


    If not, use mlxconfig to enable it.
    Note: The example below shows how to enable SR-IOV and allow the creation of 16 VFs.

    mlxconfig -d /dev/mst/mt4115_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=16
  7. Validate the configuration set. 

    # /opt/mellanox/bin/mlxconfig -e -d /dev/mst/mt4115_pciconf0 query | grep -e Config -e SRIOV_EN -e NUM_OF_VFS 
    Configurations:               Default    Current    Next Boot 
    	     NUM_OF_VFS             0       0       0         
    	     SRIOV_EN              False(0)    False(0)    False(0)  
  8. Set the number of Virtual Functions you need to create for the PF using the max_vfs module parameter. 

    max_vfs is a module parameter which expects an array on numbers where each number in the array indicates how many VFs should be created on that port.

    The number of elements in this array is based on the “supported_num_ports” module parameter.

    The example below shows the creation of 8 VFs on the first port and 7 on the second port.

    esxcli system module parameters set -m nmlx5_core -p "max_vfs=8,7"
  9. Reboot the server.

    The number of max_vf is set per port. See the nmlx5_core Module Parameters table in the introduction, for more information.

Configuring InfiniBand-SR-IOV

  1. Install the latest nmlx5 driver.
  2. Install the latest MFT version.

    # esxcli software vib install -d <mft>
    # reboot
  3. Query the firmware configuration to locate the device.

    # cd /opt/mellanox/bin
    # ./mlxconfig q
    Device type:    ConnectX4       
    PCI device:     mt4115_pciconf0
  4. Use MFT to burn the latest firmware version.

    # flint -d mt4115_pciconf0 -i <firmware.bin> b
    # reboot
  5. Set the link type of one or both ports to InfiniBand.

    # cd /opt/mellanox/bin
    # ./mlxconfig -d mt4115_pciconf0 set LINK_TYPE_P1=1 (LINK_TYPE_P2=1)

    One InfiniBand port per subnet must be dedicated to running the Subnet Manager (SM). Since the SM can only run on PFs, that port must be passthrough to a VM.

  6. Enable Ethernet PCI subclass override.

    # ./mlxconfig -d mt4115_pciconf0 set ADVANCED_PCI_SETTINGS=1
    # ./mlxconfig -d mt4115_pciconf0 set FORCE_ETH_PCI_SUBCLASS=1
  7. Set the "max_vfs" module parameter to the preferred number of VFs.

    # esxcfg-module nmlx5_core -s "max_vfs=2"
    # reboot

    The example above refers to a single port HBA. In case of dual port, then the "max_vfs" parameter should be “max_vfs=2,2”.

    InfiniBand ports are displayed on the ESXi host as “downed” uplinks and have no data path. The data path exists only for Guest OS.

  8. Assign the InfiniBand SR-IOV VFs to the VMs.For further information on how to assign the VFs, see “Assigning a Virtual Function to a Virtual Machine in the vSphere Web Client” section.
    When ESXi sets the MAC for an InfiniBand VF, the formula used to convert it to GUID is adding "FF:FE" between the 3rd and the 4th octet:



    For example:

    12:34:56:78:9A:BC --> 12:34:56:FF:FE:78:9A:BC

    When assigning VFs in InfiniBand SR-IOV, the value set for MTU is ignored.



  9. Configure the Subnet Manager.

    1. Passthrough an InfiniBand PF to a VM.

    2. Create an OpenSM config file

      opensm --create-config /etc/opensm.conf
    3. Add to the opensm.conf file “virt_enabled 2”

    4. Run OpenSM.

      opensm --config /etc/opensm.conf

      If configured correctly, the link state of the VFs should be “Active”.

    Please refer to the MLNX_OFED User Manual for further information.

    Do not forget to enable virtualization in the Subnet Manager configuration (see section "Configuring SR-IOV for ConnectX-4/Connect-IB (InfiniBand) "Step 7" in MLNX_OFED User Manual).

    Communication of InfiniBand VFs is GID based only, and requires every message to include GRH header. Thus, when using ib_write_*/ib_send_* tools, "-x 0" option must be specified explicitly.