Driver installation fails.
The install script may fail for the following reasons:
After driver installation, the openibd service fail to start. This message is logged by the driver: Unknown symbol
The driver was installed on top of an existing In-box driver.
Fixing Application Binary Interface (ABI) Incompatibility with MLNX_OFED Kernel Modules
This section is relevant for RedHat and SLES distributions only.
MLNX_OFED package for RedHat comes with RPMs that support KMP (weak-modules), meaning that when a new errata kernel is installed, compatibility links will be created under the weak-updates directory for the new kernel. Those links allow using the existing MLNX_OFED kernel modules without the need for recompilation. However, at times, the ABI of the new kernel may not be compatible with the MLNX_OFED modules, which will prevent loading them. In this case, the MLNX_OFED modules must be rebuilt against the new kernel.
Detecting ABI Incompatibility with MLNX_OFED Modules
When MLNX_OFED modules are not compatible with a new kernel from a new OS or errata kernel, no links will be created under the weak-updates directory for the new kernel, causing the driver load to fail. Checking for the existence of needed module links under weak-updates directory can be done by reloading the MLNX_OFED modules. If one or more modules are missing, the driver reload will fail with an error message.
Resolving ABI Incompatibility with MLNX_OFED Modules
In order to fix ABI incompatibility with MLNX_OFED modules, the modules should be recompiled against the new kernel, using the mlnx_add_kernel_support.sh script, available in MLNX_OFED installation image.
There are two ways to recompile the MLNX_OFED modules:
Local recompilation and installation on one server.
mlnxofedinstallcommand to recompile the kernel modules and reinstall the whole MLNX_OFED on the server. Mount MLNX_OFED ISO image or extract the TGZ file:
--kmpflag will enable rebuilding RPMs with KMP (weak-updates) support for the new kernel. Therefore, in the next OS/kernel update, the same modules can be used with the new kernel (assuming that the ABI compatibility was not broken again).
- The command above will rebuild only the kernel RPMs (using mlnx_add_kernel_support.sh), and will save the resulting MLNX_OFED package under /tmp and start installing it automatically. This package can be used for installation on other servers using regular
mlnxofedinstallcommand or yum.
- Preparing a new image on one server and deploying it on the cluster.
mlnx_add_kernel_support.shscript directly only to rebuild the kernel RPMs (without running any installations) on one server. Mount MLNX_OFED ISO image or extract the TGZ file:
Note: This command will save the resulting MLNX_OFED package under /tmp.
Install the newly created MLNX_OFED package on the cluster:
Option 1: Copy the package to the servers and install it using the
Option 2: Deploy the MLNX_OFED package using YUM (for YUM installation instructions, refer to Installing MLNX_OFED Using YUM section):
i. Extract the resulting MLNX_OFED image and copy it to a shared NFS location.
ii. Create a YUM repository configuration.
iii. Install the new MLNX_OFED kernel RPMs on the servers:
# yum updateExample:
Note: The MLNX_OFED user-space packages will not change; only the kernel RPMs will be updated. However, “YUM update” can also update other inbox packages (not related to OFED). In order to install the MLNX_OFED kernel RPMs only, make sure to run:
Note: mlnx-ofed-kernel-only is a metadata RPM that requires the MLNX_OFED kernel RPMs only.
Verify that the driver can be reloaded: