General Issues

NVIDIA MLNX_EN Documentation v23.10

Issue

Cause

Solution

The system panics when it is booted with a failed adapter installed.

Malfunction hardware component

  1. Remove the failed adapter.

  2. Reboot the system.

NVIDIA adapter is not identified as a PCI device.

PCI slot or adapter PCI connector dysfunctionality

  1. Run lspci.

  2. Reseat the adapter in its PCI slot or insert the adapter to a different PCI slot.

    If the PCI slot confirmed to be functional, the adapter should be replaced.

NVIDIA adapters are not installed in the system.

Misidentification of the NVIDIA adapter installed

Run the command below and check NVIDIA’s MAC to identify the NVIDIA adapter installed.

lspci | grep Mellanox' or 'lspci -d 15b3:

Note: NVIDIA MACs start with: 00:02:C9:xx:xx:xx, 00:25:8B:xx:xx:xx or F4:52:14:xx:xx:xx"

Insufficient memory to be used by udev upon OS boot.

udev is designed to fork() new process for each event it receives so it could han- dle many events in parallel, and each udev instance consumes some RAM memory.

Limit the udev instances running simultaneously per boot by adding udev.children-max=<number> to the kernel command line in grub.

Operating system running from root file system located on a remote storage (over NVIDIA devices), hang during reboot/shutdown (errors such as “No such file or directory” will appear).

The mlnx-en.d service script is called using the ‘stop’ option by the operating system. This option unloads the driver stack. Therefore, the OS root file system dis- appears before the reboot/ shutdown procedure is completed, leaving the OS in a hang state.

Disable the openibd ‘stop’ option by setting 'ALLOW_STOP=no' in /etc/mlnx-en.conf configuration file.

Last updated on Dec 27, 2023.