InfiniBand Related Issues

Issue

Cause

Solution

The following messages is logged after loading the driver:

multicast join failed with status - 22

Trying to join a multicast group that does not exist or exceeding the number of multicast groups supported by the SM.

If this message is logged often, check for the multicast group's join requirements as the node might not meet them. Note: If this message is logged after driver load, it may safely be ignored.

Unable to stop the driver with the following on screen message: ERROR: Module <module> is in use

An external application is using the reported module.

Manually unloading the module using the 'modprobe -r' command.

Logical link fails to come up while port logi- cal state is Initializing.

The logical port state is in the Initializing state while pending the SM for the LID assignment.

  1. Verify an SM is running in the fabric. Run 'sminfo' from any host connected to the fabric.

  2. If SM is not running, activate the SM on a node or on managed switch.

InfiniBand utilities commands fail to find devices on the system. For example, the 'ibv_devinfo' command fail with the following output:

Failed to get IB devices list: Function not implemented

The InfiniBand utilities commands are invoked when the driver is not loaded.

Load the driver:

/etc/init.d/openibd start

© Copyright 2023, NVIDIA. Last updated on Feb 18, 2024.