ESXi 6.7/6.7 U1: NVIDIA ConnectX-4/ConnectX-5 NATIVE ESXi Driver for VMware vSphere User Manual v4.17.15.16
Linux Kernel Upstream Release Notes v6.5

Ethernet Network

ConnectX®-4/ConnectX®-4 Lx/ConnectX®-5 ports can be individually configured to work as InfiniBand or Ethernet ports. The port type depends on the card type. In case of a VPI card, the default type is IB. If you wish to change the port type use the mlxconfig script.

To use a VPI card as an Ethernet only card, run:

Copy
Copied!
            

/opt/mellanox/bin/mlxconfig -d /dev/mt4115_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

The protocol types are:

  • Port Type 1 = IB

  • Port Type 2 = Ethernet

For further information on how to set the port type in ConnectX®-4/ConnectX®-4 Lx/ ConnectX®-5, please refer to the MFT User Manual (www.mellanox.com → Products → Software → InfiniBand/VPI Software → MFT - Firmware Tools).

Warning

Wake-on-LAN (WoL) is applicable only to adapter cards that support this feature.

Wake-on-LAN (WoL) is a technology that allows a network professional to remotely power on a computer or to wake it up from sleep mode.

  • To enable WoL:

    Copy
    Copied!
                

    esxcli network nic set -n <nic name> -w g

    or

    Copy
    Copied!
                

    set /net/pNics/<nic name>/wol g

  • To disable WoL:

    Copy
    Copied!
                

    vsish -e set /net/pNics/<nic name>/wol d

  • To verify configuration:

    Copy
    Copied!
                

    esxcli network nic get -n vmnic5 Advertised Auto Negotiation: true Advertised Link Modes: 10000baseT/Full, 40000baseT/Full, 100000baseT/Full, 100baseT/Full, 1000baseT/Full, 25000baseT/Full, 50000baseT/Full Auto Negotiation: false Cable Type: DA Current Message Level: -1 Driver Info: Bus Info: 0000:82:00:1 Driver: nmlx5_core Firmware Version: 12.20.1010 Version: 4.15.10.3 Link Detected: true Link Status: Up Name: vmnic5 PHYAddress: 0 Pause Autonegotiate: false Pause RX: false Pause TX: false Supported Ports: Supports Auto Negotiation: true Supports Pause: false Supports Wakeon: false Transceiver: Wakeon: MagicPacket(tm)

The driver is set to auto-negotiate by default. However, the link speed can be forced to a specific link speed supported by ESXi using the following command:

Copy
Copied!
            

esxcli network nic set -n <vmnic> -S <speed> -D <full, half>

Example:

Copy
Copied!
            

esxcli network nic set -n vmnic4 -S 10000 -D full

Where:

  • <vmnic> is the vmnic for the Mellanox card as provided by ESXi

  • <full, half> The duplex to set this NIC to. Acceptable values are: [full, half]

The driver can be reset to auto-negotiate using the following command:

Copy
Copied!
            

esxcli network nic set -n <vmnic> -a

Example:

Copy
Copied!
            

esxcli network nic set -n vmnic4 -a

where <vmnic> is the vmnic for the Mellanox card as provided by ESXi.

Priority Flow Control (PFC) IEEE 802.1Qbb applies pause functionality to specific classes of traffic on the Ethernet link. PFC can provide different levels of service to specific classes of Ethernet traffic (using IEEE 802.1p traffic classes).

Warning

When PFC is enabled, Global Pause will be operationally disabled, regardless of what is configured for the Global Pause Flow Control.

To configure PFC, do the following:

image2019-7-17_16-36-20.png

  1. Enable PFC for specific priorities.

    Copy
    Copied!
                

    esxcfg-module nmlx5_core -s "pfctx=0x08 pfcrx=0x08"

    The parameters, “pfctx” (PFC TX) and “pfcrx” (PFC RX), are specified per host. If you have more than a single card on the server, all ports will be enabled with PFC (Global Pause will be disabled even if configured).

    The value is a bitmap of 8 bits = 8 priorities. We recommend that you enable only lossless applications on a specific priority.

    To run more than one flow type on the server, turn on only one priority (e.g. priority 3), which should be configured with the parameters "0x08" = 00001000b (binary). Only the 4th bit is on (starts with priority 0,1,2 and 3 -> 4th bit).

    Warning

    The values of “pfctx” and “pfcrx” must be identical.

  2. Restart the driver.

    Copy
    Copied!
                

    reboot

Receive Side Scaling (RSS) technology allows spreading incoming traffic between different receive descriptor queues. Assigning each queue to different CPU cores allows better load balancing of the incoming traffic and improve performance.

Default Queue Receive Side Scaling (DRSS)

Default Queue RSS (DRSS) allows the user to configure multiple hardware queues backing up the default RX queue. DRSS improves performance for large scale multicast traffic between hypervisors and Virtual Machines interfaces.

To configure DRSS, use the 'DRSS' module parameter which replaces the previously advertised 'device_rss' module parameter ('device_rss' is now obsolete). The 'drss' module parameter and 'device_rss' are mutually exclusive

If the 'device_rss' module parameter is enabled, the following functionality will be configured:

  • The new Default Queue RSS mode will be triggered and all hardware RX rings will be utilized, similar to the previous 'device_rss' functionality

  • Module parameters 'DRSS' and 'RSS' will be ignored, thus the NetQ RSS, or the standard NetQ will be active

To query the 'DRSS' module parameter default, its minimal or maximal values, and restrictions, run a standard esxcli command.

For example:

Copy
Copied!
            

#esxcli system module parameters list -m nmlx5_core


NetQ RSS

NetQ RSS is a new module parameter for ConnectX-4 adapter cards providing identical functionality as the ConnectX-3 module parameter 'num_rings_per_rss_queue'. The new module parameter allows the user to configure multiple hardware queues backing up the single RX queue. NetQ RSS improves vMotion performance and multiple streams of IPv4/IPv6 TCP/ UDP/IPSEC bandwidth over single interface between the Virtual Machines.

To configure NetQ RSS, use the 'RSS' module parameter. To query the 'RSS' module parameter default, its minimal or maximal values, and restrictions, run a standard esxcli command.

For example:

Copy
Copied!
            

#esxcli system module parameters list -m nmlx5_core

Warning

Using NetQ RSS is preferred over the Default Queue RSS. Therefore, if both module parameters are set but the system lacks resources to support both, NetQ RSS will be used instead of DRSS.

Important Notes

If the 'DRSS' and 'RSS' module parameters set by the user cannot be enforced by the system due to lack of resources, the following actions are taken in a sequential order:

  1. The system will attempt to provide the module parameters default values instead of the ones set by the user

  2. The system will attempt to provide 'RSS' (NetQ RSS mode) default value. The Default Queue RSS will be disabled

  3. The system will load with only standard NetQ queues

  4. 'DRSS' and 'RSS' parameters are disabled by default, and the system loads with standard NetQ mode

Dynamic RSS

Dynamic RSS allows indirection table changes during traffic for NetQ RSS queue. To utilize Dynamic RSS, the "RSS" mode parameter must be set to activate NetQ RSS queue, and "DYN_RSS" must be enabled.
Dynamic RSS provides performance benefits for certain RX scenarios that utilize multi-stream heavy traffic (such as vMotion) that in regular RSS mode are directed to the same HW RX ring.

Multiple RSS Engines

Multiple RSS Engines improves network performance by exposing multiple RSS RX queues to hypervisor network stack. This capability enables the user to configure up to 3 RSS queues (newly named as "Engines"), including default RX queue RSS, with indirection table updates support for all RSS Engines.
Multiple RSS Engines feature is activated using the "GEN_RSS" module parameter and the indirection table updates functionality is active by default when the feature enabled, no need to specify the "DYN_RSS" module parameter.

  • The GEN_RSS module parameter is set to "2" by default, indicating 2 RSS engines

  • The DRSS module parameter is set to “4” by default, indicating the default queue RSS engine with 4 hardware queues

  • The RSS module parameter is set to “4” by default indicating the NetQ RSS engine with total of 4 hardware queues

    For the full module parameter description, run the command below on the ESXi host:

    Copy
    Copied!
                

    #esxcli system module parameters list -m nmlx5_core

Examples of how to set different RSS engines:

  • To set the default queue RSS engine:

    Copy
    Copied!
                

    #esxcli system module set -m nmlx5_core -p "DRSS=4 GEN_RSS=1"

  • To set a single NetQ RSS engine:

    Copy
    Copied!
                

    #esxcli system module set -m nmlx5_core -p "RSS=4 GEN_RSS=1" 

  • To set two NetQ RSS engines:

    Copy
    Copied!
                

    #esxcli system module set -m nmlx5_core -p "RSS=8 GEN_RSS=2"

  • To set a default queue with NetQ RSS engines:

    Copy
    Copied!
                

    #esxcli system module set -m nmlx5_core -p "DRSS=4 RSS=8 GEN_RSS=3"

  • To set dive RSS engine:

    Copy
    Copied!
                

    #esxcli system module set -m nmlx5_core -p "DRSS=16 GEN_RSS=1"

Important Notes

  • Multiple RSS Engines and the Dynamic RSS are mutual exclusive. In ESXi 6.7 Generic RSS mode is recommended

  • Multiple RSS Engines requires "DRSS" or/and "RSS" parameters settings to define the number of hardware queues for default queue RSS and NetQ RSS engines.

  • The Device RSS mode ("DRSS=16") is also an RSS Engine, but only one RX queue is available and the traffic distribution is performed across all hardware queues.

  • The amount of total hardware queues for the RSS engines (module parameter "RSS", when "GEN_RSS" specified) must dedicate 4 hardware queues per-engine.

    Warning

    It is recommended to use RoCE with PFC enabled in driver and network switches.
    For how to enable PFC in the driver see “Priority Flow Control (PFC)

Explicit Congestion Notification (ECN)

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.

ECN is enabled by default (ecn=1). To disable it, set the “ecn” module parameter to 0. For most use cases, the default setting of the ECN are sufficient. However, if further changes are required, use the nmlxcli management tool to tune the ECN algorithm behavior. For further information on the tool, see "Mellanox NIC ESXi Management Tools” section. The nmlxcli management tool can also be used to provide ECN different statistics.

VXLAN/Geneve hardware offload enables the traditional offloads to be performed on the encapsulated traffic. With ConnectX® family adapter cards, data center operators can decouple the overlay network layer from the physical NIC performance, thus achieving native performance in the new network architecture.

Configuring Overlay Networking Stateless Hardware Offload

VXLAN/Geneve hardware offload includes:

  • TX: Calculates the Inner L3/L4 and the Outer L3 checksum

  • RX:

    • Checks the Inner L3/L4 and the Outer L3 checksum

    • Maps the VXLAN traffic to an RX queue according to:

      • Inner destination MAC address

      • Outer destination MAC address

      • VXLAN ID

VXLAN/Geneve hardware offload is enabled by default and its status cannot be changed.

VXLAN/Geneve configuration is done in the ESXi environment via VMware NSX manager. For additional NSX information, please refer to VMware documentation: http://pubs.vmware.com/NSX-62/index.jsp#com.vmware.nsx.install.doc/GUID-D8578F6E-A40C-493A-9B43-877C2B75ED52.html.

© Copyright 2023, NVIDIA. Last updated on Sep 8, 2023.