NVIDIA BlueField Platform Software Troubleshooting Guide

PCIe

This page offers troubleshooting information for PCIe.

Missing PCIe Express Device

There are several stages to discovery and operation of PCIe devices, and errors at any of these stages could cause a device to be unavailable to the operating system. PCIe devices form a tree hierarchy, with each node connected to the other via a PCIe link. All the links between the root port and the endpoint device have to be trained and active in order to access the device. Link training is handled by hardware, but if it fails then all downstream devices become unavailable. The lspci tool can be used to check the downstream link status for the root ports and switches.

In the following example, the LnkSta line shows the link operating correctly. TrErr- signifies there were no training errors and DLActive+ signifies the link is up.

Copy
Copied!
            

# lspci -vv -s 5:0.0 05:00.0 PCI bridge: Mellanox Technologies MT43244 Family [BlueField-3 SoC PCIe Bridge] (rev 01) (prog-if 00 [Normal decode])         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-         Latency: 0         Interrupt: pin ? routed to IRQ 57         IOMMU group: 5         Bus: primary=05, secondary=06, subordinate=06, sec-latency=0         I/O behind bridge: 00000000-00000fff [size=4K]         Memory behind bridge: 00200000-003fffff [size=2M]         Prefetchable memory behind bridge: 0000800005000000-00008000051fffff [size=2M]         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-         BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-         Capabilities: [60] Express (v2) Downstream Port (Slot+), MSI 00                 DevCap: MaxPayload 512 bytes, PhantFunc 0                         ExtTag- RBE+                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-                         MaxPayload 128 bytes, MaxReadReq 128 bytes                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-                 LnkCap: Port #1, Speed 32GT/s, Width x2, ASPM not supported                         ClockPM- Surprise+ LLActRep+ BwNot- ASPMOptComp+                 LnkCtl: ASPM Disabled; Disabled- CommClk-                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-                 LnkSta: Speed 8GT/s (downgraded), Width x2 (ok)                         TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-


Enumeration

The next stage is PCIe enumeration. This is the process software uses to discover all the devices present in the fabric. It does this by reading from the first register of every possible device to see which ones respond. The first register of every device contains its vendor ID and device ID which uniquely identify the device. PCIe enumeration is done twice during boot - once by UEFI and then again by Linux. Every device detected by Linux PCIe enumeration will be listed by lspci. If the device shows up here, then it means that it is present in the system and responded correctly to a configuration read. It doesn't say anything about the functionality of the device or it's driver.

Resource Allocation

After enumeration, the operating system does PCIe resource allocation. If resource allocation fails then some devices will be unavailable to the OS. There are three kinds of PCIe resources - bus numbers, IO space and Memory space. BlueField doesn't support PCIe IO space, so only the other two are interesting. Every platform including BlueField supports 255 bus numbers and running out of this resource is unlikely. If PCIe memory space runs out then there will be messages like these reported by dmsg.

Copy
Copied!
            

[    0.781698] pci 0000:21:00.0: BAR 6: no space for [mem size 0x00100000 pref] [    0.781700] pci 0000:21:00.0: BAR 6: failed to assign [mem size 0x00100000 pref] [    0.781703] pci 0000:21:00.1: BAR 6: no space for [mem size 0x00100000 pref] [    0.781705] pci 0000:21:00.1: BAR 6: failed to assign [mem size 0x00100000 pref]

There are two types of PCIe memory space - 32bit and 64bit. The width refers to the size of the addresses used. BlueField-3 supports 2GB of 32-bit PCIe memory space and 128TB of 64-bit PCIe memory space. The 32-bit space is in the range 0x7fff_0000_0000 to 0x7fff_7fff_ffff. The 64-bit space is in the range 0x8000_0000_0000 to 0xffff_ffff_ffff Even though the 64-bit space is huge it is still possible to run out because some devices support a limited number of address bits. Also because the size of memory space allocations are required to be a power of 2 and naturally aligned, sometimes big chunks of the address space can not be used. If memory space allocation fails, then it can be helpful to review the contents of /proc/iomem which contains a list of all the available ranges and which ranges have been allocated to each device.

Depending on the Linux configuration, it may either keep any resource allocation done by UEFI or discard those settings and do it's own allocation. This behavior can be controlled by adding "pci=realloc=on" or "pci=realloc=off" to the kernel command line.

Device Drivers

If enumeration and resource allocation succeed but the device services are still not available, then the issue is probably with the driver. If lspci -v shows a line labeled "kernel driver in use:" or "kernel modules:" then that device driver is successfully attached to that device. In the example below, it is the nvme driver.

Copy
Copied!
            

# lspci -v -s 6:0.0 06:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less) (prog-if 02 [NVM Express])         Subsystem: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less)         Physical Slot: 0         Flags: bus master, fast devsel, latency 0, IRQ 61, IOMMU group 6         Memory at 7fff00200000 (64-bit, non-prefetchable) [size=16K]         Capabilities: [40] Express Endpoint, MSI 00         Capabilities: [80] Power Management version 3         Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+         Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-         Capabilities: [100] Advanced Error Reporting         Capabilities: [150] Virtual Channel         Capabilities: [260] Latency Tolerance Reporting         Capabilities: [300] Secondary PCI Express         Capabilities: [400] L1 PM Substates         Kernel driver in use: nvme         Kernel modules: nvme

If that line is missing, then the driver is either missing or the attach failed. In either case searching for the name of the driver in dmesg output should provide more information.

UEFI Enumeration

If debugging from Linux is difficult or not available, then the UEFI Internal Shell can be used to see the results of PCIe enumeration as done by UEFI. To enter the shell, press Esc on the console when UEFI starts to boot. From the menu, select "Boot Manager" and then scroll down to "EFI Internal Shell". The relevant commands are "pci", "devices" and "drivers". The "help" command will give usage information for each command.

Copy
Copied!
            

Shell> pci     Seg  Bus  Dev  Func    ---  ---  ---  ----     00   00   00    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device A2DA Prog Interface 0     00   01   00    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device 197B Prog Interface 0     00   02   00    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device 197B Prog Interface 0     00   02   03    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device 197B Prog Interface 0     00   03   00    00 ==> Network Controller - Ethernet controller              Vendor 15B3 Device A2DC Prog Interface 0     00   03   00    01 ==> Network Controller - Ethernet controller              Vendor 15B3 Device A2DC Prog Interface 0     00   04   00    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device 197B Prog Interface 0     00   05   00    00 ==> Bridge Device - PCI/PCI bridge              Vendor 15B3 Device 197B Prog Interface 0     00   06   00    00 ==> Mass Storage Controller - Non-volatile memory subsystem              Vendor 1E0F Device 0001 Prog Interface 2

Missing PCIe Devices

If running lspci on the BlueField produces no output and all PCIe devices are missing, then it means that the device is in Livefish mode. In that case the NIC Firmware needs to be reinstalled.

Insufficient Power on the PCIe Slot

If you see the error "Insufficient power on the PCIe slot" in dmesg, please consult the Specifications section of your BlueField Hardware User Guide to ensure that your DPU is receiving the appropriate power supply.

To check the power capacity of your host's PCIe slots, execute the command lspci -vvv | grep PowerLimit. For instance:

Copy
Copied!
            

# lspci -vvv | grep PowerLimit Slot #6, PowerLimit 75.000W; Interlock- NoCompl- Slot #1, PowerLimit 75.000W; Interlock- NoCompl- Slot #4, PowerLimit 75.000W; Interlock- NoCompl-

Note

Be aware that this command is not supported by all host vendors/types.


Obtaining the Complete PCIe Device Description

The lspci command may not display the complete descriptions for the NVIDIA PCIe devices connected to your host. For example:

Copy
Copied!
            

# lspci | grep -i Mellanox a3:00.0 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01) a3:00.1 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01) a3:00.2 DMA controller: Mellanox Technologies Device c2d3 (rev 01)

To see the full descriptions for these devices, please run the following command:

Copy
Copied!
            

# update-pciids

After doing this, you should be able to view the complete details for those devices. For example:

Copy
Copied!
            

# lspci | grep -i Mellanox a3:00.0 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01) a3:00.1 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01) a3:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)


Managing Two BlueField Platforms in the Same Server

This example demonstrates how to manage two BlueField platforms installed in the same server (the process is similar for additional platforms).

This example assumes that the RShim package has already been installed on the host server.

Configuring Management Interface on Host

Note

This example is relevant for CentOS/RHEL operating systems only.

  1. Create a bf_tmfifo interface under /etc/sysconfig/network-scripts. Run:

    Copy
    Copied!
                

    vim /etc/sysconfig/network-scripts/ifcfg-br_tmfifo

  2. Inside ifcfg-br_tmfifo, insert the following content:

    Copy
    Copied!
                

    DEVICE="br_tmfifo" BOOTPROTO="static" IPADDR="192.168.100.1" NETMASK="255.255.255.0" ONBOOT="yes" TYPE="Bridge"

  3. Create a configuration file for the first BlueField platform, tmfifo_net0. Run:

    Copy
    Copied!
                

    vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0

  4. Inside ifcfg-tmfifo_net0, insert the following content:

    Copy
    Copied!
                

    DEVICE=tmfifo_net0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo

  5. Create a configuration file for the second BlueField platform, tmfifo_net1. Run:

    Copy
    Copied!
                

    DEVICE=tmfifo_net1 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo

  6. Create the rules for the tmfifo_net interfaces. Run:

    Copy
    Copied!
                

    vim /etc/udev/rules.d/91-tmfifo_net.rules

  7. Restart the network for the changes to take effect. Run:

    Copy
    Copied!
                

    # /etc/init.d/network restart Restarting network (via systemctl): [ OK ]

Configuring BlueField Platform Side

The BlueField platforms are shipped with the following factory default configurations for tmfifo_net0.

Address

Value

MAC

00:1a:ca:ff:ff:01

IP

192.168.100.2

Therefore, if you are working with more than one platform, you must change the default MAC and IP addresses.

Updating the RShim Network MAC Address

Note

This procedure is relevant for Ubuntu/Debian (sudo needed), and CentOS BFBs. The procedure only affects the tmfifo_net0 on the Arm side.

  1. Use a Linux console application (e.g. screen or minicom) to log into each BlueField. For example:

    Copy
    Copied!
                

    # sudo screen /dev/rshim<0|1>/console 115200

  2. Create a configuration file for tmfifo_net0 MAC address. Run:

    Copy
    Copied!
                

    # sudo vi /etc/bf.cfg

  3. Inside bf.cfg, insert the new MAC:

    Copy
    Copied!
                

    NET_RSHIM_MAC=00:1a:ca:ff:ff:03

  4. Apply the new MAC address. Run:

    Copy
    Copied!
                

    sudo bfcfg

  5. Repeat this procedure for the second BlueField platform (using a different MAC address).

    Info

    Arm must be rebooted for this configuration to take effect. It is recommended to update the IP address before you do that to avoid unnecessary reboots.

Note

For comprehensive list of the supported parameters to customize bf.cfg during BFB installation, refer to section "bf.cfg Parameters".


Updating an IP Address

For Ubuntu:

  1. Access the file 50-cloud-init.yaml and modify the tmfifo_net0IP address:

    Copy
    Copied!
                

    sudo vim /etc/netplan/50-cloud-init.yaml   tmfifo_net0: addresses: - 192.168.100.2/30 ===>>> 192.168.100.3/30

  2. Reboot the Arm. Run:

    Copy
    Copied!
                

    sudo reboot

  3. Repeat this procedure for the second BlueField platform (using a different IP address).

    Info

    Arm must be rebooted for this configuration to take effect. It is recommended to update the MAC address before you do that to avoid unnecessary reboots.

For CentOS:

  1. Access the file ifcfg-tmfifo_net0. Run:

    Copy
    Copied!
                

    # vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0

  2. Modify the value for IPADDR:

    Copy
    Copied!
                

    IPADDR=192.168.100.3

  3. Reboot the Arm. Run:

    Copy
    Copied!
                

    reboot

    Or perform netplan apply.

  4. Repeat this procedure for the second BlueField DPU (using a different IP address).

    Info

    Arm must be rebooted for this configuration to take effect. It is recommended to update the MAC address before you do that to avoid unnecessary reboots.

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.