RShim Troubleshooting and How-Tos
Several generations of NVIDIA® BlueField® networking platforms (DPUs or SuperNICs) are equipped with a USB interface in which RShim can be routed, via USB cable, to an external host running Linux and the RShim driver.
In this case, typically following a system reboot, the RShim over USB prevails and the BlueField host reports RShim status as "another backend already attached". This is correct behavior, since there can only be one RShim backend active at any given time. However, this means that the BlueField host does not own RShim access.
To reclaim RShim ownership safely:
Stop the RShim driver on the remote Linux. Run:
systemctl stop rshim systemctl disable rshim
Restart RShim on the BlueField host. Run:
systemctl enable rshim systemctl start rshim
The "another backend already attached" scenario can also be attributed to the RShim backend being owned by the BMC in BlueField devices with integrated BMC. This is elaborated on further down on this page.
Verify whether your BlueField features an integrated BMC or not. Run:
# sudo sudo lspci -s $(sudo lspci -d 15b3: | head -1 | awk '{print $1}') -vvv | grep "Product Name"
Example output for BlueField with an integrated BMC:
Product Name: BlueField-2 DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
If your BlueField has an integrated BMC, refer to RShim driver not loading on host with integrated BMC.
If your BlueField does not have an integrated BMC, refer to RShim driver not loading on host on BlueField without integrated BMC.
RShim driver not loading on BlueField with integrated BMC
RShim driver not loading on host
Access the BMC via the RJ45 management port of BlueField.
Delete RShim on the BMC:
systemctl stop rshim systemctl disable rshim
Enable RShim on the host:
systemctl enable rshim systemctl start rshim
Restart RShim service. Run:
sudo systemctl restart rshim
If RShim service does not launch automatically, run:
sudo systemctl status rshim
This command is expected to display "active (running)".
Display the current setting. Run:
# cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)
This output indicates that the RShim service is ready to use.
RShim driver not loading on BMC
Verify that the RShim service is not running on host. Run:
systemctl status rshim
If the output is active, then it may be presumed that the host has ownership of the RShim.
Delete RShim on the host. Run:
systemctl stop rshim systemctl disable rshim
Enable RShim on the BMC. Run:
systemctl enable rshim systemctl start rshim
Display the current setting. Run:
# cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME usb-1.0
This output indicates that the RShim service is ready to use.
RShim driver not loading on host on BlueField without integrated BMC
Download the suitable DEB/RPM for RShim (management interface for BlueField from the host) driver.
Reinstall RShim package on the host.
For Ubuntu/Debian, run:
sudo dpkg --force-all -i rshim-<version>.deb
For RHEL/CentOS, run:
sudo rpm -Uhv rshim-<version>.rpm
Restart RShim service. Run:
sudo systemctl restart rshim
If RShim service does not launch automatically, run:
sudo systemctl status rshim
This command is expected to display "active (running)".
Display the current setting. Run:
# cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)
This output indicates that the RShim service is ready to use.
Verify that your card has BMC. Run the following on the host:
# sudo sudo lspci -s $(sudo lspci -d 15b3: | head -
1
| awk'{print $1}'
) -vvv |grep"Product Name"
Product Name: BlueField-2
DPU 25GbE Dual-Port SFP56, integrated BMC, Crypto and Secure Boot Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHLThe product name is supposed to show "integrated BMC" .
Access the BMC via the RJ45 management port of BlueField.
Delete RShim on the BMC:
systemctl stop rshim systemctl disable rshim
Enable RShim on the host:
systemctl enable rshim systemctl start rshim
Restart RShim service. Run:
sudo systemctl restart rshim
If RShim service does not launch automatically, run:
sudo systemctl status rshim
This command is expected to display "active (running)".
Display the current setting. Run:
# cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME pcie-04:00.2 (ro)
This output indicates that the RShim service is ready to use.
For more information, refer to section "RShim Multiple Board Support".
The BFB installation flow can be traced using various interfaces:
From the host:
RShim console (/dev/rshim0/console)
RShim log buffer (/dev/rshim0/misc); also included in bfb-install's output
UART console (/dev/ttyUSB0)
From the BMC console:
SSH to the BMC and run obmc-console-client
InfoAdditional information about BMC interfaces is available in BMC software documentation
From the BlueField:
/root/<OS>.installation.log available on the BlueField Arm OS after installation