Performance Related Troubleshooting

Issue

Cause

Solution

Low performance issues

The OS profile might not be configured for maximum performance.

  1. Go to "Power Options" in the "Control Panel". Make sure "Maximum Performance" is set as the power scheme

  2. Reboot the machine.

Flow Control is disabled when kernel debugger is configured in Windows server 2012 and above.

When a kernel debugger is configured (not necessarily physically connected) then the flow control might be disabled.

Set the registry key as following: HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters

  • Type: REG_DWORD

  • Key name: AllowFlowControlUn- derDebugger

  • Value: 1

Package drop or low performance on specific traffic class.

Might be a lack of QoS and Flow Control settings configuration or their misconfiguration.

Check the configured settings for all of the QoS options. Open a PowerShell prompt and use "Get-NetAdapterQos". To achieve maximum performance all of the following must exist:

  • All of the hosts, switches and routers should use the same matching flow control settings. If Global-pause is used, all devices must be configured for it. If PFC (Priority Flow-control) is used all devices must have matching settings for all priorities.

  • ETS settings that limit speed of some priorities will greatly affect the out- put results.

  • Make sure Flow-Control is enabled on the Mellanox Interfaces (enabled by default). Go to the device manager, right click the Mellanox inter- face go to "Advanced" and make sure Flow-control is enabled for both TX and RX.

  • To eliminate QoS and Flow-control as the performance degrading factor, set all devices to run with Global Pause and rerun the tests:

    • Set Global pause on the switches, routers.

    • Run "Disable-NetAdapterQos *" on all of the hosts in a PowerShell window.

Issue 1. Go to “Device Manager”, locate the Mellanox adapter that you are debugging, right-click and choose “Properties” and go to the “Information” tab:

  • PCI Gen 1: should appear as "PCI-E 2.5 GT/s"

  • PCI Gen 2: should appear as "PCI-E 5.0 GT/s"

  • PCI Gen 3: should appear as "PCI-E 8.0 GT/s"

  • Link Speed: 56.0Gbps / 40.0Gbps / 10.0Gbps

Issue 2. To determine if the Mellanox NIC and PCI bus can achieve their maximum speed, it's best to run nd_send_bw in a loopback. On the same machine:

  1. Run "start /b /affinity 0x1 nd_send_bw -S <IP_host>" where <IP_host> is the local IP.

  2. Run "start /b /affinity 0x2 nd_send_bw -C <IP_host>"

  3. Repeat for port 2 with the appropriate IP.

  4. On PCI Gen3 the expected result is around 5700MB/s

On PCI Gen2 the expected result is around 3300MB/s.
Any number lower than that points to bad configuration or installation on the wrong PCI slot. Malfunctioning QoS settings and Flow Control can be the cause as well.

Issue 3. To determine the maximum speed between the two sides with the most basic test:

  1. Run "nd_send_bw -S <IP_host1>" on machine 1 where <IP_host1> is the local IP.

  2. Run "nd_send_bw -C <IP_host1>" on machine 2.

  3. Results appear in Gb/s (Gigabits 2^30), and reflect the actual data that was transferred, excluding headers.

  4. If these results are not as expected, the problem is most probably with one or more of the following:

  • Old Firmware version. Misconfigured Flow-control: Global pause or PFC is configured wrong on the hosts, routers and switches. See RDMA over Converged Ethernet (RoCE).

  • CPU/power options are not set to "Maximum Performance".

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.