Application Binary Interface (ABI) Incompatibility with MLNX_EN Kernel Modules

NVIDIA MLNX_EN Documentation v23.04-

This section is relevant for RedHat and SLES distributions only.

MLNX_EN package for RedHat comes with RPMs that support KMP (weak-modules), meaning that when a new errata kernel is installed, compatibility links will be created under the weak-updates directory for the new kernel. Those links allow using the existing MLNX_EN kernel modules without the need for recompilation. However, at times, the ABI of the new kernel may not be compatible with the MLNX_EN modules, which will prevent loading them. In this case, the MLNX_EN modules must be rebuilt against the new kernel.

When MLNX_EN modules are not compatible with a new kernel from a new OS or errata kernel, no links will be created under the weak-updates directory for the new kernel, causing the driver load to fail. Checking for the existence of needed module links under weak-updates directory can be done by reloading the MLNX_EN modules. If one or more modules are missing, the driver reload will fail with an error message.



******************************************************************************** # /etc/init.d/mlnx-en.d restart Unloading HCA driver: [ OK ] Loading HCA driver and Access Layer: [ OK ] Module rdma_cm belong to kernel which is not a part of MLNX[FAILED]kipping... Loading rdma_ucm [FAILED] ********************************************************************************

Resolving ABI Incompatibility with MLNX_EN Modules

In order to fix ABI incompatibility with MLNX_EN modules, the modules should be recompiled against the new kernel, using the script, available in MLNX_EN installation image.
There are two ways to recompile the MLNX_EN modules:

  1. Local recompilation and installation on one server.
    Run the install command to recompile the kernel modules and reinstall the whole MLNX_EN on the server. Mount MLNX_EN ISO image or extract the TGZ file:


    # cd <MLNX_EN dir> # ./install --skip-distro-check --add-kernel-support --kmp --force

    - The --kmp flag will enable rebuilding RPMs with KMP (weak-updates) support for the new kernel. Therefore, in the next OS/kernel update, the same modules can be used with the new kernel (assuming that the ABI compatibility was not broken again).
    - The command above will rebuild only the kernel RPMs (using, and will save the resulting MLNX_EN package under /tmp and start installing it automatically. This package can be used for installation on other servers using regular install command or yum.

  2. Preparing a new image on one server and deploying it on the cluster.

    1. Use the script directly only to rebuild the kernel RPMs (without running any installations) on one server. Mount MLNX_EN ISO image or extract the TGZ file:


      # cd <MLNX_EN dir> # ./ -m $PWD --kmp -y

      Note: This command will save the resulting MLNX_EN package under /tmp.



      ******************************************************************************** # cd /tmp/MLNX_EN_LINUX-5.2- # ./ -m $PWD --kmp -y Note: This program will create mlnx-en TGZ for rhel7.8 under /tmp directory. See log file /tmp/mlnx_iso.28286_logs/mlnx_ofed_iso.28286.log   Checking if all needed packages are installed... Building mlnx-en RPMS . Please wait...   Creating metadata-rpms for 3.10.0-1127.el7.x86_64 ... WARNING: If you are going to configure this package as a repository, then please note WARNING: that it contains unsigned rpms, therefore, you need to disable the gpgcheck WARNING: by setting 'gpgcheck=0' in the repository conf file. Created /tmp/mlnx-en-5.3- ********************************************************************************

    2. Install the newly created MLNX_EN package on the cluster:

      Option 1: Copy the package to the servers and install it using the install script.

      Option 2: Deploy the MLNX_EN package using YUM (for YUM installation instructions, refer to Installing MLNX_EN Using YUM section):
      i. Extract the resulting MLNX_EN image and copy it to a shared NFS location.
      ii. Create a YUM repository configuration.
      iii. Install the new MLNX_EN kernel RPMs on the servers: # yum update Example:


      ******************************************************************************** ... ... ======================================================================================================================== Package Arch Version Repository Size ======================================================================================================================== Updating: epel-release noarch 7-7 epel 14 k kmod-iser x86_64 1.8.0-OFED. mlnx_ofed 35 k kmod-isert x86_64 1.0-OFED. mlnx_ofed 32 k kmod-kernel-mft-mlnx x86_64 4.4.0-1.201606210906.rhel7u1 mlnx_ofed 10 k kmod-knem-mlnx x86_64 mlnx_ofed 22 k kmod-mlnx-ofa_kernel x86_64 3.3-OFED. mlnx_ofed 1.4 M kmod-srp x86_64 1.6.0-OFED. mlnx_ofed 39 k   Transaction Summary ======================================================================================================================== Upgrade 7 Packages ... ... ********************************************************************************

      Note: The MLNX_EN user-space packages will not change; only the kernel RPMs will be updated. However, “YUM update” can also update other inbox packages (not related to OFED). In order to install the MLNX_EN kernel RPMs only, make sure to run:


      # yum install mlnx-en-kernel-only

      Note: mlnx-en-kernel-only is a metadata RPM that requires the MLNX_EN kernel RPMs only.

    3. Verify that the driver can be reloaded:


      # /etc/init.d/mlnx-en.d restart

© Copyright 2023, NVIDIA. Last updated on Sep 9, 2023.