RShim Troubleshooting and How-Tos

Verify whether your DPU features an integrated BMC or not. Run:

Copy
Copied!
            

# sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv |grep "Product Name"

Example output for DPU with integrated BMC:

Copy
Copied!
            

Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

If your DPU has an integrated BMC, refer to RShim driver not loading on host with integrated BMC.

If your DPU does not have an integrated BMC, refer to RShim driver not loading on host on DPU without integrated BMC.

RShim driver not loading on DPU with integrated BMC

RShim driver not loading on host

  1. Access the BMC via the RJ45 management port of the DPU.

  2. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  5. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

RShim driver not loading on BMC

  1. Verify that the RShim service is not running on host. Run:

    Copy
    Copied!
                

    systemctl status rshim

    If the output is active, then it may be presumed that the host has ownership of the RShim.

  2. Delete RShim on the host. Run:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  3. Enable RShim on the BMC. Run:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME usb-1.0

    This output indicates that the RShim service is ready to use.

RShim driver not loading on host on DPU without integrated BMC

  1. Download the suitable DEB/RPM for RShim (management interface for DPU from the host) driver.

  2. Reinstall RShim package on the host.

    • For Ubuntu/Debian, run:

      Copy
      Copied!
                  

      sudo dpkg --force-all -i rshim-<version>.deb

    • For RHEL/CentOS, run:

      Copy
      Copied!
                  

      sudo rpm -Uhv rshim-<version>.rpm

  3. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  4. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

  1. Verify that your card has BMC. Run the following on the host:

    Copy
    Copied!
                

    # sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv |grep "Product Name" Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL

    The product name is supposed to show "integrated BMC" .

  2. Access the BMC via the RJ45 management port of the DPU.

  3. Delete RShim on the BMC:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  4. Enable RShim on the host:

    Copy
    Copied!
                

    systemctl enable rshim systemctl start rshim

  5. Restart RShim service. Run:

    Copy
    Copied!
                

    sudo systemctl restart rshim

    If RShim service does not launch automatically, run:

    Copy
    Copied!
                

    sudo systemctl status rshim

    This command is expected to display "active (running)".

  6. Display the current setting. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)

    This output indicates that the RShim service is ready to use.

For more information, please refer to section "RShim Multiple Board Support".

© Copyright 2023, NVIDIA. Last updated on Sep 9, 2023.