Ethernet Related Troubleshooting

Linux Kernel Upstream Release Notes v6.5

Issue

Cause

Solution

Low performance caused by insufficient number of MSI-X vectors.

The number of MSI-X vectors required by the driver equals the NumberOfCpuCores + 3. In cases where the default number of MSI-X vectors for a PF is 64, but there are more than 64 CPU cores, the driver will generate an event log.

Use mlxconfig tool to increase MSI-X vector allocation (NUM_PF_MSIX) for a PF to avoid sharing of resources (fewer MSI-X vectors would mean sharing of resources).

Note: mlxconfig is contained in the MFT package.

Low performance

Non-optimal system configuration might have occurred.

See section “Performance Tuning” to take advantage of NVIDIA® 10/40/56 GBit NIC performance.

The driver fails to start.

There might have been an RSS configuration mismatch between the TCP stack and the NVIDIA® adapter.

  1. Open the event log and look under "System" for the "mlx5" source.

  2. If found, enable RSS, run: "netsh int tcp set global rss = enabled" or a less recommended suggestion (as it will cause low performance):

    Disable RSS on the adapter, run: "netsh int tcp set global rss = no dynamic balancing".

The driver fails to start and a yellow sign appears near the "Mellanox ConnectX- 4/ConnectX-5 Adapter <X>" in the Device Manager display. (Code 10)

Look into the Event Viewer to view the error.

  • If the failure occurred due to unsupported mode type, refer section Port Management for the solution.

  • If the solution isn't mentioned in event viewer, disable and re-enable "Mellanox ConnectX-4/ConnectX-5 Adapter <X>" from the Device Manager display. If the failure resumes, please refer to Support.

No connectivity to a Fault Tolerance team while using network capture tools (e.g., Wireshark).

The network capture tool might have captured the network traffic of the non-active adapter in the team. This is not allowed since the tool sets the packet filter to "promiscuous", thus causing traffic to be transferred on multiple interfaces.

Close the network capture tool on the physical adapter card, and set it on the team interface instead.

No Ethernet connectivity on 10Gb adapters after activating Performance Tuning (part of the installation).

A TcpWindowSize registry value might have been added.

  • Remove the value key under HKEY_LO- CAL_MACHINE\SYSTEM\CurrentCon- trolSet\Services\Tcpip\Paramet ers\TcpWindowSize

    or

  • Set its value to 0xFFFF.

Packets are being lost.

The port MTU might have been set to a value higher than the maximum MTU supported by the switch.

Change the MTU according to the maximum MTU supported by the switch.

NVGRE changes done on a running VM, are not propagated to the VM.

The configuration changes might not have taken effect until the OS is restarted.

Stop the VM and afterwards perform any NVGRE configuration changes on the VM connected to the virtual switch.

© Copyright 2023, NVIDIA. Last updated on Nov 1, 2023.