Low performance issues
The OS profile might not be configured for maximum performance.
Flow Control is disabled when kernel debugger is configured in Windows server 2012 and above.
When a kernel debugger is configured (not necessarily physically connected) then the flow control might be disabled.
Set the registry key as following: HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters
Package drop or low performance on specific traffic class.
Might be a lack of QoS and Flow Control settings configuration or their misconfiguration.
Check the configured settings for all of the QoS options. Open a PowerShell prompt and use "Get-NetAdapterQos". To achieve maximum performance all of the following must exist:
Issue 1. Go to “Device Manager”, locate the Mellanox adapter that you are debugging, right-click and choose “Properties” and go to the “Information” tab:
- PCI Gen 1: should appear as "PCI-E 2.5 GT/s"
- PCI Gen 2: should appear as "PCI-E 5.0 GT/s"
- PCI Gen 3: should appear as "PCI-E 8.0 GT/s"
- Link Speed: 56.0Gbps / 40.0Gbps / 10.0Gbps
Issue 2. To determine if the Mellanox NIC and PCI bus can achieve their maximum speed, it's best to run nd_send_bw in a loopback. On the same machine:
- Run "start /b /affinity 0x1 nd_send_bw -S <IP_host>" where <IP_host> is the local IP.
- Run "start /b /affinity 0x2 nd_send_bw -C <IP_host>"
- Repeat for port 2 with the appropriate IP.
- On PCI Gen3 the expected result is around 5700MB/s
On PCI Gen2 the expected result is around 3300MB/s.
Any number lower than that points to bad configuration or installation on the wrong PCI slot. Malfunctioning QoS settings and Flow Control can be the cause as well.
Issue 3. To determine the maximum speed between the two sides with the most basic test:
- Run "nd_send_bw -S <IP_host1>" on machine 1 where <IP_host1> is the local IP.
- Run "nd_send_bw -C <IP_host1>" on machine 2.
- Results appear in Gb/s (Gigabits 2^30), and reflect the actual data that was transferred, excluding headers.
- If these results are not as expected, the problem is most probably with one or more of the following:
- Old Firmware version. Misconfigured Flow-control: Global pause or PFC is configured wrong on the hosts, routers and switches. See RDMA over Converged Ethernet (RoCE).
- CPU/power options are not set to "Maximum Performance".