NVIDIA BlueField DPU BSP v3.9.3

RShim Troubleshooting and How-Tos

Several generations of BlueField DPUs are equipped with a USB interface in which RShim can be routed, via USB cable, to an external host running Linux and the RShim driver.

In this case, typically following a system reboot, the RShim over USB prevails and the DPU host reports RShim status as "another backend already attached". This is correct behavior, since there can only be one RShim backend active at any given time. However, this means that the DPU host does not own RShim access.

To reclaim RShim ownership safely:

  1. Stop the RShim driver on the remote Linux. Run:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  2. Restart RShim on the DPU host. Run:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

The "another backend already attached" scenario can also be attributed to the RShim backend being owned by the BMC in DPUs with integrated BMC. This is elaborated on further down on this page.

Verify whether your DPU features an integrated BMC or not. Run:

Copy
Copied!
            

# sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv | grep "Product Name"

Example output for DPU with integrated BMC:

Copy
Copied!
            

Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

If your DPU has an integrated BMC, refer to RShim driver not loading on host with integrated BMC.

If your DPU does not have an integrated BMC, refer to RShim driver not loading on host on DPU without integrated BMC.

RShim driver not loading on DPU with integrated BMC

RShim driver not loading on host

  1. Access the BMC via the RJ45 management port of the DPU.

  2. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  5. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

RShim driver not loading on BMC

  1. Verify that the RShim service is not running on host. Run:

    Copy
    Copied!
                

    systemctl status rshim

    If the output is active, then it may be presumed that the host has ownership of the RShim.

  2. Delete RShim on the host. Run:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the BMC. Run:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME usb-1.0

    This output indicates that the RShim service is ready to use.

RShim driver not loading on host on DPU without integrated BMC

  1. Download the suitable DEB/RPM for RShim (management interface for DPU from the host) driver.

  2. Reinstall RShim package on the host.

    • For Ubuntu/Debian, run:

      Copy
      Copied!
                  

      sudo dpkg --force-all -i rshim-<version>.deb

    • For RHEL/CentOS, run:

      Copy
      Copied!
                  

      sudo rpm -Uhv rshim-<version>.rpm

  3. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

  1. Verify that your card has BMC. Run the following on the host:

    Copy
    Copied!
                

    # sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv |grep "Product Name" Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

    The product name is supposed to show "integrated BMC" .

  2. Access the BMC via the RJ45 management port of the DPU.

  3. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  4. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  5. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  6. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

For more information, please refer to section "RShim Multiple Board Support".

© Copyright 2023, NVIDIA. Last updated on Aug 23, 2023.