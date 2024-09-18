Single Root I/O Virtualization (SR-IOV) is a technology that allows a physical PCIe device to present itself multiple times through the PCIe bus. This technology enables multiple virtual instances of the device with separate resources. NVIDIA® adapters are capable of exposing up to 127 virtual instances called Virtual Functions (VFs) per port. These virtual functions can then be provisioned separately. Each VF can be seen as an addition device connected to the Physical Function. It also shares resources with the Physical Function.

SR-IOV is commonly used in conjunction with an SR-IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance.

This guide demonstrates the setup and configuration of SR-IOV, using NVIDIA® adapter cards family. SR-IOV VF is a single port device.

System Requirements Server and BIOS A server and BIOS with SR-IOV support. Note: BIOS settings may require an update to enable virtualization support and SR-IOV support. Hypervisor OS: Ethernet: Windows Server 2012 R2 and above

IPoIB: Windows Server 2016 and above Virtual Machine (VM) OS: Windows Server 2012 and above Adapter cards NVIDIA® ConnectX®-4 onward adapter cards SR-IOV supported driver version: SR-IOV Ethernet over Hyper-V: WinOF-2 v1.20 or higher

SR-IOV IPoIB over Hyper-V and the guest: WinOF-2 v1.80 or higher and on Windows Server 2016

SR-IOV in IPoIB node: LID based IPoIB is supported with the following limitations: It does not support routers in the fabric It supports up to 2^15-1 LIDs

No synthetic path: The SR-IOV path that goes thru the WinOF-2 driver Although both the NVIDIA® adapter - Virtual Function (VF) and the NetVSC will be presented in the VM, it is recommended to use only the NVIDIA® interface.

The sections below describe the required flows for configuring the host machines:

Depending on your system, perform the steps below to set up your BIOS. The figures used in this section are for illustration purposes only.

For further information, please refer to the appropriate BIOS User Manual.

Note It is recommended to enable the "above 4G decoding" BIOS setting for features that require a large amount of PCIe resources (e.g., SR-IOV with numerous VFs, PCIe Emulated Switch, Large BAR Requests).

To enable SR-IOV in BIOS:

Make sure the machine’s BIOS supports SR-IOV. Please, consult BIOS vendor website for SR-IOV supported BIOS versions list. Update the BIOS version if necessary. Enable SR-IOV according to the BIOS vendor guidelines. For example: Enable SR-IOV. Enable “Intel Virtualization Technology” Support. For further details, please refer to the vendor's website.

To install Hypervisor Operating System:

Install Windows Server 2012 R2 Install Hyper-V role: Go to: Server Manager -> Manage -> Add Roles and Features and set the following: Installation Type -> Role-based or Feature-based Installation Server Selection -> Select a server from the server pool Server Roles -> Hyper-V (see figures below) Install Hyper-V Management Tools. Features - > Remote Server Administration Tools -> Role Administration Tools ->Hyper-V Administration Tool. Confirm installation selection. Click Install. Reboot the system.

To verify that the system is properly configured for SR-IOV:

Go to: Start-> Windows Powershell. Run the following PowerShell commands. Copy Copied! PS $ (Get-VmHost).IovSupport PS $ (Get-VmHost).IovSupportReasons In case that SR-IOV is supported by the OS, the output in the PowerShell is as in the figure below. Note If the BIOS was updated according to BIOS vendor instructions and you see the message displayed in the figure below, update the registry configuration as described in the (Get-VmHost).IovSupportReasons message. Reboot Verify the system is configured correctly for SR-IOV as described in Steps 1/2.

To verify resources sufficiency in the adapter to enable SR-IOV VFs:

Go to: Start-> Windows Powershell. Run the following PowerShell commands. Copy Copied! PS C:\Windows\system32> Get-NetAdapterSriov Example: Copy Copied! Name : SLOT 4 Port 1 InterfaceDescription : Mellanox ConnectX- 4 Adapter Enabled : True SriovSupport : NoVfBarSpace SwitchName : “Default Switch” NumVFs : 32 Note If the “SriovSupport” field value shows “NoVfBarSpace” , SR-IOV cannot be used on this network adapter as there are not enough PCI Express BAR resources available. To use SR-IOV, you need to reduce the number of VFs to the number supported by the OS.

For further information, see https://technet.microsoft.com/en-us/library/jj130915(v=wps.630).aspx

To create a Virtual Machine:

Go to: Server Manager -> Tools -> Hyper-V Manager. Go to: New->Virtual Machine and set the following: Name: <name>

Startup memory: 4096 MB

Connection: Not Connected Connect the virtual hard disk in the New Virtual Machine Wizard. Go to: Connect Virtual Hard Disk -> Use an existing virtual hard disk. Select the location of the VHD file.

In SR-IOV mode, the host allocates memory resources per the adapter`s needs for each VF. It is important to limit the amount of memory that the VF can receive from the host, in order to ensure the host’s stability. To prevent excessive allocation, the MaxFWPagesUsagePerVF registry key must be configured to the maximum number of 4KB pages that the host could allocate for VFs resources. In case of attempting to use more pages then configured, an error will be printed to the system event log. For more information, see See SR-IOV Options.

The sections below describe the required flows for configuring the NVIDIA® Network Adapter for SR-IOV:

Note For non-NVIDIA® (OEM) branded cards you may need to download and install the new firmware.

To enable SR-IOV using mlxconfig:

Note mlxconfig is part of MFT tools used to simplify firmware configuration. The tool is available using MFT v3.6.0 or higher.

Download MFT for Windows. Get the device ID (look for the “_pciconf” string in the output). Copy Copied! mst status Example: Copy Copied! MST devices: ------------ mt4115_pciconf0 Check the current SR-IOV configuration. Copy Copied! mlxconfig -d mt4115_pciconf0 q Example: Copy Copied! Device # 1 : ---------- Device type: ConnectX4 PCI device: mt4115_pciconf0 Configurations: Current SRIOV_EN N/A NUM_OF_VFS N/A WOL_MAGIC_EN_P2 N/A LINK_TYPE_P1 N/A LINK_TYPE_P2 N/A Enable SR-IOV with 16 VFs. Copy Copied! mlxconfig -d mt4115_pciconf0 s SRIOV_EN= 1 NUM_OF_VFS= 16 Note All servers are guaranteed to support 16 VFs. Increasing the number of VFs can lead to exceeding the BIOS limit of MMIO available address space. Note OS limits the maximum number of VFs to 32 per Network Adapter. To increase the number of VFs, the following PowerShell command should be used: Set-NetAdapterSRIOV - name <AdapterName> -NumVFs <Required number of VFs> Example: Copy Copied! Device # 1 : ---------- Device type: ConnectX4 PCI device: mt4115_pciconf0 Configurations: Current New SRIOV_EN N/A 1 NUM_OF_VFS N/A 16 WOL_MAGIC_EN_P2 N/A N/A LINK_TYPE_P1 N/A N/A LINK_TYPE_P2 N/A N/A Apply new Configuration? ? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.

The SM should be up in the fabric in order to work with IPoIB, and can be run on a switch or on a Linux host.

Switch SM Configuration

Install the SM image that supports virtualization (3.6.4640 version and above). For more details, please refer to the switch operating system User Manual. Enter the config mode. Copy Copied! switch > enable switch # config terminal Enable the SM (to disable the SM, type: no ib sm). Copy Copied! ib sm Enable virtualization. Copy Copied! ib sm virt enable Save the configuration. Copy Copied! configuration write Restart the switch. Copy Copied! reload Validate the Subnet Manager is enabled. Copy Copied! show ib sm Validate Virtualization is enabled. Copy Copied! show ib sm virt

For more details, please refer to the Subnet Manager (SM) section in the MLNX-OS® User Manual for VPI.

Linux Host SM Configuration

Enable the virtualization by setting the virt_enable field to 2 on the /etc/opensm/opensm.conf file. Start OpenSM and bind it to a specific port. Copy Copied! opensm -e -B -g <Port GUID> OpenSM may be bound to one port at a time. If the given GUID is 0, the OpenSM displays a list of possible port GUIDs and awaits user input. Without “-g”, the OpenSM attempts to use the default port.

Get the device name. Copy Copied! mst status Show device configurations. Copy Copied! mlxconfig -d <device name> q Enable SR-IOV: (1 = Enable). Copy Copied! mlxconfig -d <device name> set SRIOV_EN= 1 Set max VFs count. Copy Copied! mlxconfig -d <device name> set NUM_OF_VFS=<Count> Configure device to work in IB mode (1=IB). Copy Copied! mlxconfig -d <device name> set LINK_TYPE_P1= 1 set LINK_TYPE_P2= 1 Enable LID based IPoIB. Copy Copied! mlxconfig -d <Device name> set SRIOV_IB_ROUTING_MODE_P1= 1 mlxconfig -d <Device name> set SRIOV_IB_ROUTING_MODE_P2= 1 Restart the firmware. Copy Copied! mlxfwreset -d <Device name> r --yes Note The mlxconfig and mlxfwreset tools are a part of the WinMFT package. For more details, please refer to the MFT User Manual. Note To enable IPoIB LID base by mlxconfig, install MFT v4.8.0-25, and above.

For further details on enabling/configuring SR-IOV on KVM, please refer to section “Single Root IO Virtualization (SR-IOV)” in MLNX_OFED for Linux User Manual.

To configure Virtual Machine networking:

Create an SR-IOV-enabled Virtual Switch over NVIDIA® Ethernet Adapter. - Go to: Start -> Server Manager -> Tools -> Hyper-V Manager - Hyper-V Manager: Actions -> Virtual SwitchManager -> External-> Create Virtual Switch Set the following: - Name: - External network: - Enable single-root I/O virtualization (SR-IOV) Click Apply. Click OK. Add a VMNIC connected to a NVIDIA® vSwitch in the VM hardware settings: - Under Actions, go to Settings -> Add New Hardware-> Network Adapter-> OK - In “Virtual Switch” dropdown box, choose Mellanox SR-IOV Virtual Switch Enable the SR-IOV for Mellanox VMNIC: Open VM settings Wizard. Open the Network Adapter and choose Hardware Acceleration. Tick the “Enable SR-IOV” option. Click OK. Start and connect to the Virtual Machine: Select the newly created Virtual Machine and go to: Actions panel-> Connect. In the virtual machine window go to: Actions-> Start Copy the WinOF driver package to the VM using Mellanox VMNIC IP address. Install WinOF driver package on the VM. Reboot the VM at the end of installation. Verify that Mellanox Virtual Function appears in the device manager.

Note To achieve best performance on SR-IOV VF, please run the following powershell commands on the host: For 10Gbe: PS $ Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -IovQueuePairsRequested 4

For 40Gbe and 56Gbe: PS $ Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -IovQueuePairsRequested 8





WinOF-2 supports two levels of spoof protection:

Hypervisor sets VF's MAC address and only packets with that MAC can be transmitted by the VF

Hypervisor can control allowed Ethertypes that the VF can transmit

If a VF attempts to transmit packets with undesired source MAC or Ethertype, the packets will be dropped by an internal e-Switch.

By default, the anti-spoof filter is enabled with the following Ethertypes:

Internet Protocol version 4 (IPv4) (0x0800)

Internet Protocol Version 6 (IPv6) (0x86DD)

Address Resolution Protocol (ARP) (0x0806)

The hypervisor can configure an Ethertype table for VFs, which includes a set of allowed Ethertypes values for transmission via the NIC registry. The registry keys are as follows:

Key Name Key Type Values Description VFAllowedTxEtherTypeListEnab le REG_SZ 0 = Disabled 1 = Enabled (default) Enables/disables the feature VFAllowedTxEtherType0 REG_DWORD Ethertype value The first Ethertype to allow VF to transmit VFAllowedTxEtherType1 REG_DWORD Ethertype value The second Ethertype to allow VF to transmit VFAllowedTxEtherType2 REG_DWORD Ethertype value The third Ethertype to allow VF to transmit VFAllowedTxEtherType3 REG_DWORD Ethertype value The fourth Ethertype to allow VF to transmit VFAllowedTxEtherType4 REG_DWORD Ethertype value The fifth Ethertype to allow VF to transmit VFAllowedTxEtherType5 REG_DWORD Ethertype value The sixth Ethertype to allow VF to transmit VFAllowedTxEtherType6 REG_DWORD Ethertype value The seventh Ethertype to allow VF to transmit VFAllowedTxEtherType7 REG_DWORD Ethertype value The eighth Ethertype to allow VF to transmit

By default, the feature is enabled and uses the default Ethertype table.

The Source MAC protection cannot be disabled, and the Ethertype protection can be disabled by setting the VFAllowedTxEtherTypeListEnable key to 0.

When the feature is disabled, only the Ethernet flow control protocol (0x8808) is restricted to be transmitted by the VF.

Configuring at least one Ethertype in the registry will override the default table of the Ethertypes mentioned above.

This feature forces every received\sent DHCP packet to be redirected to PF, including DHCP packets sent or received for VFs. The detection of a packet as a DHCP is done by checking UDP-Ports 67 and 68.

Note When using devices older than ConnectX-5 (i.e. ConnectX-4 and ConnectX-4 Lx) and when this capability is set to ‘on’, the VF’s version must be higher than WinOF-2 v2.50.

To enable this new capability, the steps below are required:

Set the PF to work on promiscuous mode to enable PF to receive DHCP packet from various ethernet addresses. Add to the NIC a new registry named “RedirectVfDHCPToPF” and set this registry to ‘1’.

Key Name Key Type Values Description RedirectVfDHCPToPF REG_SZ 0 = Disabled (default) 1 = Enabled Enables/disables the feature. Note: After changing the registry key'’s value, driver restart is required.

Note The feature is not supported in VMs, only in Hyper-V Host.

VF CPU Monitor capability allows the user to check two characteristics of VFs, namely – VF ‘FwCpuUsage’ and ‘Errors2FW’ counters. If the values of these counters are too high, warnings will be presented in the Event Log. The warnings come from the Host driver, which reads the ‘FwCpuUsage’ and ‘Errors2FW’ counters automatically once in VfCpuMonBatchPeriodSec seconds, compares the results with the previous reading, and issues warnings if the difference in values is greater than VfCpuMonFwCpuUsageMax and VfCpuMonErrors2FwMax thresholds correspondently.

The Event message format is as follows:

Copy Copied! VF <vf_id> used too many resources over the last % 4 seconds: FwCpuUsage % 5 %, Errors2Fw % 6 .

Note that <vf_id> can theoretically be incorrect if the reported VF was de-attached and another new VF was assigned its number.

For further information on detach/attach events see Microsoft Event Log file %SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-Hyper-V-Worker-Admin.evtx.

Batch Request: The driver reads the VF ‘FwCpuUsage’ and ‘Errors2FW’ counters using new FW “batch request” which allows reading one counter from all VFs in a single command. The result of this command is a resource dump. The user can perform the batch request, using mlx5cmd.exe (see more in Resource Dump section). The below are examples of how to read the counters: To read the VF ‘FwCpuUsage’ counter from VF0 to VF31: Copy Copied! mlx5cmd.exe -dbg -ResourceDump -Dump -Segment 0x5000 -Index1 1 -NumOfObj1 32 -Index2 1 –Depth 1 To read the VF ‘Errors2FW’ counter from VF1 to VF8: Copy Copied! mlx5cmd.exe -dbg -ResourceDump -Dump -Segment 0x5000 -Index1 2 -NumOfObj1 8 -Index2 2 –Depth 1



The tool will print the name of the folder with the result, written into a file.

Note The ‘-NumOfObj1’ special values 0xffff and 0xfffe are not supported for the segment 0x5000.

Feature state" The user can check the state of the feature by running the ‘mlx5cmd -Features’ command. If the feature is not supported by the firmware or was disabled by default or by the user, the tool prints State: Disabled Otherwise, the tool prints State: Enabled To disable/enable the feature, change the value of the VfCpuMonEnable parameter. To print the configuration parameters, run: Copy Copied! mlx5cmd -RegKeys –DynamicKeys | grep VfCpu



For further information, see “VF Monitoring Registry Keys”.