PCIe
This page offers troubleshooting information for PCIe.
Missing PCIe Express Device
PCIe Links
The discovery and operation of PCIe devices involve multiple stages, and errors at any stage can render a device inaccessible to the operating system (OS). PCIe devices are organized in a tree-like hierarchy, with each node connected via PCIe links. For a device to be accessible, all links between the root port and the endpoint device must be successfully trained and active. While link training is managed by hardware, failure at this stage results in the unavailability of all downstream devices. Tools such as lspci can be used to verify the status of downstream links for root ports and switches.
In the following example, the LnkSta line shows the link operating correctly. TrErr- signifies there were no training errors and DLActive+ signifies the link is up.
            
            # lspci -vv -s 5:0.0
05:00.0 PCI bridge: Mellanox Technologies MT43244 Family [BlueField-3 SoC PCIe Bridge] (rev 01) (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin ? routed to IRQ 57
        IOMMU group: 5
        Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
        I/O behind bridge: 00000000-00000fff [size=4K]
        Memory behind bridge: 00200000-003fffff [size=2M]
        Prefetchable memory behind bridge: 0000800005000000-00008000051fffff [size=2M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [60] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 32GT/s, Width x2, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x2 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
    
    
    
        
Enumeration
The next stage is PCIe enumeration, a process by which software discovers all devices present in the PCIe fabric. This is accomplished by reading the first register of every possible device address to determine which devices respond. The first register contains the vendor ID and device ID, which uniquely identify the device. PCIe enumeration occurs twice during boot: once by UEFI and then again by Linux.
Devices detected during Linux PCIe enumeration are listed by lspci. If a device appears here, it indicates that the device is present in the system and has responded correctly to a configuration read. However, this does not guarantee the functionality of the device or its associated driver.
Resource Allocation
After enumeration, the OS performs PCIe resource allocation. If resource allocation fails, some devices may become unavailable to the OS. There are three types of PCIe resources:
- I/O space 
- Bus numbers 
- Memory space 
The BlueField platform does not support PCIe I/O space, leaving bus numbers and memory space as the primary considerations. The system supports 255 buses, making exhaustion of this resource unlikely. However, insufficient PCIe memory space can lead to errors, which are often logged in dmesg.
            
            [    0.781698] pci 0000:21:00.0: BAR 6: no space for [mem size 0x00100000 pref]
[    0.781700] pci 0000:21:00.0: BAR 6: failed to assign [mem size 0x00100000 pref]
[    0.781703] pci 0000:21:00.1: BAR 6: no space for [mem size 0x00100000 pref]
[    0.781705] pci 0000:21:00.1: BAR 6: failed to assign [mem size 0x00100000 pref]
    
There are two types of PCIe memory space:
- 32-bit memory space – BlueField-3 supports 2 GB, ranging from - 0x7FFF_0000_0000to- 0x7FFF_7FFF_FFFF.
- 64-bit memory space – BlueField-3 supports 128 TB, ranging from - 0x8000_0000_0000to- 0xFFFF_FFFF_FFFF.
Despite the large capacity of 64-bit memory space, exhaustion is still possible. This can occur due to devices that support a limited number of address bits or alignment requirements, which may leave large chunks of address space unusable. If memory space allocation fails, reviewing /proc/iomem can be helpful. This file lists all available memory ranges and their allocation status for each device.
Depending on the Linux configuration, it may either retain the resource allocation performed by UEFI or reallocate resources independently. This behavior can be controlled using the kernel command-line options pci=realloc=on or pci=realloc=off.
Device Drivers
If enumeration and resource allocation succeed but the device services are still not available, then the issue is likely with the driver. If lspci -v shows a line labeled Kernel driver in use or Kernel modules, then the device driver is successfully attached to that device. In the following example, it is the NVMe driver:
            
            # lspci -v -s 6:0.0
06:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less) (prog-if 02 [NVM Express])
        Subsystem: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less)
        Physical Slot: 0
        Flags: bus master, fast devsel, latency 0, IRQ 61, IOMMU group 6
        Memory at 7fff00200000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
        Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Virtual Channel
        Capabilities: [260] Latency Tolerance Reporting
        Capabilities: [300] Secondary PCI Express
        Capabilities: [400] L1 PM Substates
        Kernel driver in use: nvme
        Kernel modules: nvme
    
If that line is missing, then the driver is either missing or the attachment failed. In either case, searching for the name of the driver in the dmesg output should provide more information.
UEFI Enumeration
If debugging from Linux is difficult or not available, the UEFI Internal Shell can be used to see the results of PCIe enumeration as done by UEFI. To enter the shell, press Esc on the console when UEFI starts to boot. From the menu, select Boot Manager and then scroll down to EFI Internal Shell. The relevant commands are pci, devices, and drivers. The help command will provide usage information for each command.
Example pci command output:
            
            Shell> pci 
   Seg  Bus  Dev  Func
   ---  ---  ---  ----
    00   00   00    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device A2DA Prog Interface 0
    00   01   00    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device 197B Prog Interface 0
    00   02   00    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device 197B Prog Interface 0
    00   02   03    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device 197B Prog Interface 0
    00   03   00    00 ==> Network Controller - Ethernet controller
             Vendor 15B3 Device A2DC Prog Interface 0
    00   03   00    01 ==> Network Controller - Ethernet controller
             Vendor 15B3 Device A2DC Prog Interface 0
    00   04   00    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device 197B Prog Interface 0
    00   05   00    00 ==> Bridge Device - PCI/PCI bridge
             Vendor 15B3 Device 197B Prog Interface 0
    00   06   00    00 ==> Mass Storage Controller - Non-volatile memory subsystem
             Vendor 1E0F Device 0001 Prog Interface 2
    
Missing PCIe Devices
If running lspci on the BlueField produces no output and all PCIe devices are missing, this indicates that the device is in Livefish mode. In this case, the NIC firmware must be reinstalled.
Insufficient Power on the PCIe Slot
If you see the error Insufficient power on the PCIe slot in dmesg, consult the "Specifications" page of your BlueField device's hardware user guide to ensure that it is receiving the appropriate power supply.
To check the power capacity of your host's PCIe slots, execute the command lspci -vvv | grep PowerLimit. For instance:
            
            # lspci -vvv | grep PowerLimit           
                        Slot #6, PowerLimit 75.000W; Interlock- NoCompl-
                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl-
                        Slot #4, PowerLimit 75.000W; Interlock- NoCompl-
    
This command is not supported by all host vendors or types.
    
    
        
Obtaining the Complete PCIe Device Description
The lspci command may not display the complete descriptions of NVIDIA PCIe devices connected to the host system. For example:
            
            # lspci | grep -i Mellanox
a3:00.0 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01)
a3:00.1 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01)
a3:00.2 DMA controller: Mellanox Technologies Device c2d3 (rev 01)
    
To obtain the full descriptions of these devices, run:
            
            # update-pciids
    
Once the PCIe device ID database has been updated, the lspci command should display detailed information for each device. For example:
            
            # lspci | grep -i Mellanox
a3:00.0 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
a3:00.1 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
a3:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)
    
    
    
        
Managing Two BlueField Platforms in the Same Server
This example demonstrates the procedure for managing two BlueField platforms installed in the same server. The process is similar when managing additional platforms.
This example assumes that the RShim package is already installed on the host server.
Configuring Management Interface on Host
This example applies only to CentOS and RHEL operating systems.
- Create a - br_tmfifointerface configuration file. Run:- vim /etc/sysconfig/network-scripts/ifcfg-br_tmfifo - Add the following content to the file: - DEVICE= - "br_tmfifo"BOOTPROTO=- "static"IPADDR=- "192.168.100.1"NETMASK=- "255.255.255.0"ONBOOT=- "yes"TYPE=- "Bridge"
- Create a configuration file for the first BlueField platform ( - tmfifo_net0). Run:- vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0 - Add the following content to the file: - DEVICE=tmfifo_net0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo 
- Create a configuration file for the second BlueField platform ( - tmfifo_net1). Run:- vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net1 - Add the following content to the file: - DEVICE=tmfifo_net1 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo 
- Define rules for the - tmfifo_netinterfaces: Run:- vim /etc/udev/rules.d/91-tmfifo_net.rules 
- Restart the network to apply the changes. Run: - # /etc/init.d/network restart - Expected output: - Restarting network (via systemctl): [ OK ] 
Configuring BlueField Platform Side
BlueField platforms are shipped with the following factory default configurations for tmfifo_net0.
| Address | Value | 
| MAC | 
 | 
| IP | 
 | 
If more than one BlueField platform is in use, the default MAC and IP addresses must be modified.
Updating the RShim Network MAC Address
This procedure applies to Ubuntu/Debian (with sudo), and CentOS BFB installations. It only affects tmfifo_net0 on the Arm side.
- Use a Linux console application (e.g., - screenor- minicom) to log into each BlueField platform. For example:- # sudo screen /dev/rshim<0|1>/console 115200 
- Create a configuration file for the - tmfifo_net0MAC address:- # sudo vi /etc/bf.cfg 
- Insert the new MAC address into the - bf.cfgfile:- NET_RSHIM_MAC=00:1a:ca:ff:ff:03 
- Apply the new MAC address: - sudo bfcfg 
- Repeat this process for the second BlueField platform, ensuring each one uses a unique MAC address. Info- The Arm processor must be rebooted for the changes to take effect. To avoid unnecessary reboots, it is recommended to update the IP address before restarting the Arm. 
For a     
comprehensive list of the supported parameters to customize     
bf.cfg    
 during BFB installation, refer to the     
"bf.cfg Parameters" section in the "Customizing BlueField Software Deployment Using bf.cfg" page.
    
    
        
Updating an IP Address
- For Ubuntu: - Edit the - 50-cloud-init.yamlfile to update the- tmfifo_net0IP address:- sudovim /etc/netplan/50-cloud-init.yaml- Modify the entry as follows: - tmfifo_net0: addresses: - - 192.168.- 100.2/- 30# Change to: -- 192.168.- 100.3/- 30
- Reboot the Arm. Run: - sudoreboot
- Repeat this process for the second BlueField platform, ensuring each one has a unique IP address. Info- The Arm processor must be rebooted for the changes to take effect. It is recommended to update the MAC address before restarting the Arm to minimize reboots. 
 
- For CentOS: - Edit the - ifcfg-tmfifo_net0file:- # vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0 
- Update the - IPADDRfield:- IPADDR=192.168.100.3 
- Reboot the Arm processor or apply the changes: - reboot - Alternatively, use - netplan apply.
- Repeat this process for the second BlueField DPU, ensuring a unique IP address is assigned. Info- The Arm processor must be rebooted for the changes to take effect. It is recommended to update the MAC address before restarting the Arm to minimize reboots.