NVIDIA INNOVA-2 FLEX OPEN ADAPTER CARD

Mellanox Innova-2 Flex Open Card Driver

This chapter describes how to install and test the Mellanox OFED for Linux package on a single host machine with the Mellanox Innova-2 Flex Open adapter hardware installed.

Warning

MLNX_OFED should be installed before installing the following content from the bundle.

Hardware and Software Requirements

Requirements

Description

Platforms

A server platform with an adapter card based on one of the following Mellanox Technologies’ adapter devices: MT4119 ConnectX®-5 (VPI, IB, EN) (firmware: fw-ConnectX5)

For the list of supported architecture platforms, please refer to the Mellanox OFED Release Notes file in https://docs.mellanox.com/category/mlnxofedib.

Required Disk Space for Installation

1GB

Device ID

The list of Mellanox Technologies PCI Device IDs can be found in the PCI ID repository at http://pci-ids.ucw.cz/read/PC/15b3.

Operating System

Linux operating system.

For the list of supported operating system distributions and kernels, please refer to the Mellanox OFED Release Notes file in https://docs.mellanox.com/category/mlnxofedib.

Installer Privileges

The installation requires administrator privileges on the target machine.

Warning

MLNX_OFED can be downloaded from the Mellanox public web site; see "Software Download Locations" in Mellanox Bundle Content and Considerations.

  1. Verify that the system has a Mellanox network adapter (HCA/NIC) installed.
    The following example shows a system with an installed Mellanox HCA:

    Copy
    Copied!
                

    # lspci -v | grep Mellanox 06:00.0 Network controller: Mellanox Technologies MT28800 Family [ConnectX-5] Subsystem: Mellanox Technologies Device 0024

  2. Download the ISO image to your host.
    The image’s name has the format MLNX_OFED_LINUX-<ver>-<OS label><CPU arch>.iso. An ISO image for the Mellanox Innova-2 Flex Open adapter can be obtained through Mellanox support.

  3. Use the md5sum utility to confirm the file integrity of your ISO image. Run the following command and compare the result to the value provided on the download page.

    Copy
    Copied!
                

    host1$ md5sum MLNX_OFED_LINUX-<ver>-<OS label>.iso

Installation Script

The installation script, mlnxofedinstall, performs the following:

  • Discovers the currently installed kernel

  • Uninstalls any software stacks that are part of the standard operating system distribution or another vendor's commercial stack

  • Installs the MLNX_OFED_LINUX binary RPMs (if they are available for the current kernel)

  • Identifies the currently installed InfiniBand and Ethernet network adapters and automatically upgrades the firmware.
    Note: The firmware will not be updated if you run the install script with the ‘--without-fw-update’ option.

Usage

Copy
Copied!
            

./mnt/mlnxofedinstall [OPTIONS]

The installation script removes all previously installed Mellanox OFED packages and re-installs from scratch. You will be prompted to acknowledge the deletion of the old packages.

Warning

Pre-existing configuration files will be saved with the extension “.conf.rpmsave”.

  • If you need to install Mellanox OFED on an entire (homogeneous) cluster, a common strategy is to mount the ISO image on one of the cluster nodes and then copy it to a shared file system such as NFS. To install on all the cluster nodes, use cluster-aware tools (such as pdsh).

    • If your kernel version does not match with any of the offered pre-built RPMs, you can add your kernel version by using the “mlnx_add_kernel_support.sh” script located under the docs/ directory.

      Warning

      On Redhat distributions with errata kernel installed there is no need to use the mlnx_ad- d_kernel_support.sh script. The regular installation can be performed and weak-updates mechanism will create symbolic links to the MLNX_OFED kernel modules.

      The “mlnx_add_kernel_support.sh” script can be executed directly from the mlnxofedinstall script. For further information, please see '--add-kernel-support' option below.

      Warning

      On Ubuntu distributions drivers installation use Dynamic Kernel Module Support (DKMS) framework. Thus, the drivers' compilation will take place on the host during MLNX_OFED installation. Therefore, using "mlnx_add_kernel_support.sh" is irrelevant on Ubuntu distributions.

      Usage

      Copy
      Copied!
                  

      mlnx_add_kernel_support.sh -m|--mlnx_ofed <path to MLNX_OFED directory> [--make-iso|--make-tgz]

      where:

      [--make-iso]

      Create MLNX_OFED ISO image

      [--make-tgz] 

      Create MLNX_OFED tarball (default)

      [-t|--tmpdir <local work dir>] 

      Local work directory

      [--kmp] 

      Enable KMP format if supported

      [-k | --kernel] <kernel version> 

      Kernel version to use

      [-s | --kernel-sources] <path to the kernel sources> 

      Path to kernel headers

      [-v|--verbose]

      Enable verbose mode

      [-n|--name] 

      Name of the package to be created

      [-y|--yes] 

      Answer "yes" to all questions

      [--force] 

      Force removing packages that depend on MLNX_OFED


      Example
      The following command will create a MLNX_OFED_LINUX ISO image for RedHat 7.2 under the /tmp directory.

      Copy
      Copied!
                  

      # ./MLNX_OFED_LINUX-x.x-x-rhel7.2-x86_64/mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_- LINUX-x.x-x-rhel7.1-x86_64/ --make-tgz Note: This program will create MLNX_OFED_LINUX TGZ for rhel7.2 under /tmp directory. All Mellanox, OEM, OFED, or Distribution packages will be removed. Do you want to continue?[y/N]:y See log file /tmp/mlnx_ofed_iso.21642.log   Building OFED RPMs. Please wait... Removing OFED RPMs... Created /tmp/MLNX_OFED_LINUX-x.x-x-rhel7.1-x86_64-ext.tgz

  • The script adds the following lines to /etc/security/limits.conf for the userspace components such as MPI:

    • * soft memlock unlimited

    • * hard memlock unlimited

These settings set the amount of memory that can be pinned by a user space application to unlimited. If desired, tune the value unlimited to a specific amount of RAM.

For further information, see the help file.

where:

-c|--config <packages config_file> 

Example of the configuration file can be found under docs

-n|--net <network config_file> 

Example of the network configuration file can be found under docs

-k|--kernel-version <kernel version>

Use provided kernel version instead of 'uname -r'

-p|--print-available 

Print available packages for current platform and create corresponding ofed.conf file

--with-32bit 

Install 32-bit libraries

--without-32bit 

Skip 32-bit libraries installation (default)

--without-depcheck 

Skip Distro's libraries check

--without-fw-update 

Skip firmware update

--fw-update-only 

Update firmware. Skip driver installation

--force-fw-update

Force firmware update

--force

Force installation

--all|--hpc|--basic|--msm 

Install all, hpc, basic or Mellanox Subnet manager packages correspondingly

--vma|--vma-vpi 

Install packages required by VMA to support VPI

--vma-eth 

Install packages required by VMA to work over Ethernet

--with-vma 

Set configuration for VMA use (to be used with any installation parameter)

--guest

Install packages required by guest OS

--hypervisor 

Install packages required by hypervisor OS

-v|-vv|-vvv 

Set verbosity level

--umad-dev-rw 

Grant non root users read/write permission for umad devices instead of default

--umad-dev-na 

Prevent from non root users read/write access for umad devices. Overrides '--umad-dev-rw'

--enable-affinity 

Run mlnx_affinity script upon boot

--disable-affinity 

Disable mlnx_affinity script (Default)

--enable-sriov 

Burn SR-IOV enabled firmware. Note: Enable/Disable of SRI-OV in a non-volatile configuration through uEFI and/or tool will override this flag.

--add-kernel-support 

Add kernel support (Run mlnx_add_kernel_support.sh)

--skip-distro-check 

Do not check MLNX_OFED vs Distro matching

--hugepages-overcommit 

Setting 80% of MAX_MEMORY as overcommit for huge page allocation

-q Set quiet- 

No messages will be printed

--without-<package> 

Do not install package

--with-fabric-collector 

Install fabric-collector package

  1. Login to the installation machine as root.

  2. Mount the ISO image on your machine.

    Copy
    Copied!
                

    host1# mount -o ro,loop MLNX_OFED_LINUX-<ver>-<OS label>-<CPU arch>.iso /mnt

  3. Run the installation script.

    Copy
    Copied!
                

    ./mnt/mlnxofedinstall Logs dir: /tmp/MLNX_OFED_LINUX-x.x-x.logs This program will install the MLNX_OFED_LINUX package on your machine. Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed. Uninstalling the previous version of MLNX_OFED_LINUX   Starting MLNX_OFED_LINUX-x.x.x installation ... ........ ........ Installation finished successfully.   Attempting to perform Firmware update... Querying Mellanox devices firmware ...

    Warning

    In case your machine has an unsupported network adapter device, no firmware update will occur and the error message below will be printed. Please contact your hardware vendor for help on firmware updates.
    Error message:
    Device #1:
    ----------
    Device: 0000:05:00.0
    Part Number:
    Description:
    PSID: MT_2410110034MT_2490110032
    Versions: Current Available
    FW 14.12.1000 N/A
    Status: No matching image found

  4. Reboot the machine if the installation script performed firmware updates to your network adapter hardware. Otherwise, restart the driver by running: "/etc/init.d/openibd restart".

After the installer completes, information about the Mellanox OFED installation such as prefix, kernel version, and installation parameters can be retrieved by running the command /etc/ infiniband/info. Most of the Mellanox OFED components can be configured or reconfigured after the installation by modifying the relevant configuration files. See the relevant chapters in this manual for details.

The list of the modules that will be loaded automatically upon boot can be found in the /etc/ infiniband/openib.conf file.

Installation Results

Software

  • Most of MLNX_OFED packages are installed under the “/usr” directory except for the following packages which are installed under the “/opt” directory:

    • openshmem, bupc, fca and ibutils

  • The kernel modules are installed under

    • /lib/modules/`uname -r`/extra/mlnx-ofa_kernel on RHEL and other RedHat like Distributions

    • /lib/modules/`uname -r`/updates/dkms/ on Ubuntu

Firmware

  • The firmware of existing network adapter devices will be updated if the following two conditions are fulfilled:

    • The installation script is run in default mode; that is, without the option ‘--without- fw-update’

    • The firmware version of the adapter device is older than the firmware version included with the Mellanox OFED ISO image

      Note: If an adapter’s flash was originally programmed with an Expansion ROM image, the automatic firmware update will also burn an Expansion ROM image.

  • In case your machine has an unsupported network adapter device, no firmware update will occur and the error message below will be printed.
    -I- Querying device ...
    -E- Can't auto detect fw configuration file: ...
    Please contact your hardware vendor for help on firmware updates.

Installation Logging

While installing MLNX_OFED, the install log for each selected package will be saved in a separate log file.
The path to the directory containing the log files will be displayed after running the installation script in the following format: "Logs dir: /tmp/MLNX_OFED_LINUX-<version>.<PID>.logs".
Example:

Copy
Copied!
            

Logs dir: /tmp/MLNX_OFED_LINUX-x.x-x.logs

Driver Load upon System Boot

Upon system boot, the Mellanox drivers will be loaded automatically.

T

  1. Add the following lines to the "/etc/modprobe.d/mlnx.conf" file.

    Copy
    Copied!
                

    blacklist mlx4_core blacklist mlx4_en blacklist mlx5_core

mlnxofedinstall Return Codes

The table below lists the mlnxofedinstall script return codes and their meanings.

Return Code

Meaning

0

The installation ended successfully

1

The installation failed

2

No firmware was found for the adapter device

22

Invalid parameter

28

Not enough free space

171

Not applicable to this system configuration. This can occur when the required hardware is not present on the system.

172

Prerequisites are not met. For example, missing the required software installed or the hardware is not configured correctly.

173

Failed to start the mst driver

Uninstalling MLNX_OFED

Use the script /usr/sbin/ofed_uninstall.sh to uninstall the Mellanox OFED package. The script is a part of the ofed-scripts RPM.

Updating the Device Manually

In case you ran the mlnxofedinstall script with the ‘--without-fw-update’ option or you are using an OEM card and now you wish to (manually) update firmware on your adapter card(s), you need to perform the steps below. The following steps are also appropriate in case you wish to burn newer firmware that you have obtained from Mellanox Support.

  1. Get the device’s PSID.

    Copy
    Copied!
                

    mst start mst status flint -d <mst device> q | grep PSID PSID

  2. Get the firmware BIN file provided by Mellanox for the adapter card.

  3. Burn the firmware, using mlxup, Mellanox Update and Query Utility - http://www.mellanox.com/page/mlxup_firmware_tool.

    Copy
    Copied!
                

    mlxup -i <fw_file.bin>

  4. Reboot your machine after the firmware burning is completed.

UEFI Secure Boot

All kernel modules included in MLNX_OFED for RHEL7 are signed with x.509 key to support loading the modules when Secure Boot is enabled.

Enrolling Mellanox's x.509 Public Key on Your Systems

In order to support loading MLNX_OFED drivers when an OS supporting Secure Boot boots on a UEFI-based system with Secure Boot enabled, the Mellanox x.509 public key should be added to the UEFI Secure Boot key database and loaded onto the system key ring by the kernel.

Follow these steps below to add the Mellanox's x.509 public key to your system:

Warning

Prior to adding the Mellanox's x.509 public key to your system, please make sure that:

  • The 'mokutil' package is installed on your system, and

  • The system is booted in UEFI mode.

  1. Download the x.509 public key.

    Copy
    Copied!
                

    # wget http://www.mellanox.com/downloads/ofed/mlnx_signing_key_pub.der

  2. Add the public key to the MOK list using the mokutil utility.

    You will be asked to enter and confirm a password for this MOK enrollment request.

    Copy
    Copied!
                

    # mokutil --import mlnx_signing_key_pub.der

  3. Reboot the system.

The pending MOK key enrollment request will be noticed by shim.efi and it will launch Mok- Manager.efi to allow you to complete the enrollment from the UEFI console. You will need to enter the password you previously associated with this request and confirm the enrollment. Once done, the public key is added to the MOK list, which is persistent. Once a key is in the MOK list, it will be automatically propagated to the system key ring and subsequent will be booted when the UEFI Secure Boot is enabled.

Warning

To see what keys have been added to the system key ring on the current boot, install the 'keyutils' package and run:
#keyctl list %:.system_keyring

Removing the Signature from Kernel Modules

The signature can be removed from a signed kernel module using the 'strip' utility which is provided by the 'binutils' package.

Copy
Copied!
            

# strip -g my_module.ko

The strip utility will change the given file without saving a backup. The operation can be undo only by resigning the kernel module. Hence, we recommend backing up a copy prior to removing the signature.

T

  1. Remove the signature.

    Copy
    Copied!
                

    # rpm -qa | grep -E "kernel-ib|mlnx-ofa_kernel|iser|srp|knem" | xargs rpm -ql | grep "\.ko$" | xargs strip -g

    After the signature has been removed, a massage as the below will no longer be presented upon module loading:

    Copy
    Copied!
                

    "Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 

    However, please note that a similar message as the following will still be presented:

    Copy
    Copied!
                

    "my_module: module verification failed: signature and/or required key missing - taint- ing kernel"

    This message is presented once, only for each boot for the first module that either has no signature or whose key is not in the kernel key ring. Thus, it is much easier to miss this message. You will not see it on repeated tests where you unload and reload a kernel module until you reboot. There is no way to eliminate this message.

  2. Update the initramfs on RHEL systems with the stripped modules.

    Copy
    Copied!
                

    mkinitrd /boot/initramfs-$(uname -r).img $(uname -r) --force

© Copyright 2023, NVIDIA. Last updated on May 22, 2023.