Issue | Cause | Solution |
---|---|---|
Low performance issues | The OS profile might not be configured for maximum performance. |
|
Low SMBDirect performance | The NetworkDirect registry is enabled by default in the NIC but the ECN and/or PFC is not enabled in the switch. | Either enable ECN/PFC in the switch or set NetworkDirect to zero. |
General Diagnostic
Go to “Device Manager”, locate the Mellanox adapter that you are debugging, right- click and choose “Properties” and go to the “Information” tab:
- PCI Gen 1: should appear as "PCI-E 2.5 GT/s"
- PCI Gen 2: should appear as "PCI-E 5.0 GT/s"
- PCI Gen 3: should appear as "PCI-E 8.0 GT/s"
- Link Speed: 56.0 Gbps / 40.0Gbps / 10.0Gbps / 100 Gbps
To determine if the Mellanox NIC and PCI bus can achieve their maximum speed, it's best to run nd_send_bw in a loopback. On the same machine:
Run "start /b /affinity 0x1 nd_send_bw -S <IP_host>" where <IP_host> is the local IP.
- Run "start /b /affinity 0x2 nd_send_bw -C <IP_host>"
Repeat for port 2 with the appropriate IP.
On PCI Gen3 the expected result is around 5700MB/s
On PCI Gen2 the expected result is around 3300MB/s
Any number lower than that points to bad configuration or installation on the wrong PCI slot. Malfunctioning QoS settings and Flow Control can be the cause as well.
- To determine the maximum speed between the two sides with the most basic test:
- Run "nd_send_bw -S <IP_host1>" on machine 1 where <IP_host1> is the local IP.
- Run "nd_send_bw -C <IP_host1>" on machine 2.
- Results appear in Gb/s (Gigabits 2^30), and reflect the actual data that was transferred, excluding headers.
- If these results are not as expected, the problem is most probably with one or more of the following:
- Old Firmware version.
- Misconfigured Flow-control: Global pause or PFC is configured wrong on the hosts, routers and switches.
- CPU/power options are not set to "Maximum Performance".