RShim Troubleshooting and How-Tos

NVIDIA BlueField BSP v4.5.2

Several generations of NVIDIA® BlueField® networking platforms (DPUs or SuperNICs) are equipped with a USB interface in which RShim can be routed, via USB cable, to an external host running Linux and the RShim driver.

In this case, typically following a system reboot, the RShim over USB prevails and the BlueField host reports RShim status as "another backend already attached". This is correct behavior, since there can only be one RShim backend active at any given time. However, this means that the BlueField host does not own RShim access.

To reclaim RShim ownership safely:

  1. Stop the RShim driver on the remote Linux. Run:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  2. Restart RShim on the BlueField host. Run:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

The "another backend already attached" scenario can also be attributed to the RShim backend being owned by the BMC in BlueField devices with integrated BMC. This is elaborated on further down on this page.

Verify whether your BlueField features an integrated BMC or not. Run:

Copy
Copied!
            

# sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv | grep "Product Name"

Example output for BlueField with an integrated BMC:

Copy
Copied!
            

Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

If your BlueField has an integrated BMC, refer to RShim driver not loading on host with integrated BMC.

If your BlueField does not have an integrated BMC, refer to RShim driver not loading on host on BlueField without integrated BMC.

RShim driver not loading on BlueField with integrated BMC

RShim driver not loading on host

  1. Access the BMC via the RJ45 management port of BlueField.

  2. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  5. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

RShim driver not loading on BMC

  1. Verify that the RShim service is not running on host. Run:

    Copy
    Copied!
                

    systemctl status rshim

    If the output is active, then it may be presumed that the host has ownership of the RShim.

  2. Delete RShim on the host. Run:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the BMC. Run:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME usb-1.0

    This output indicates that the RShim service is ready to use.

RShim driver not loading on host on BlueField without integrated BMC

  1. Download the suitable DEB/RPM for RShim (management interface for BlueField from the host) driver.

  2. Reinstall RShim package on the host.

    • For Ubuntu/Debian, run:

      Copy
      Copied!
                  

      sudo dpkg --force-all -i rshim-<version>.deb

    • For RHEL/CentOS, run:

      Copy
      Copied!
                  

      sudo rpm -Uhv rshim-<version>.rpm

  3. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

  1. Verify that your card has BMC. Run the following on the host:

    Copy
    Copied!
                

    # sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv |grep "Product Name" Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

    The product name is supposed to show "integrated BMC" .

  2. Access the BMC via the RJ45 management port of BlueField.

  3. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  4. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  5. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  6. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

The BFB installation flow can be traced using various interfaces:

  • From the host:

    • RShim console (/dev/rshim0/console)

    • RShim log buffer (/dev/rshim0/misc); also included in bfb-install's output

    • UART console (/dev/ttyUSB0)

  • From the BMC console:

    • SSH to the BMC and run obmc-console-client

      Info

      Additional information about BMC interfaces is available in BMC software documentation

  • From the BlueField:

    • /root/<OS>.installation.log available on the BlueField Arm OS after installation

© Copyright 2024, NVIDIA. Last updated on Jul 10, 2024.