PCIe
This page offers troubleshooting information for PCIe.
Missing PCIe Express Device
PCIe Links
The discovery and operation of PCIe devices involve multiple stages, and errors at any stage can render a device inaccessible to the operating system (OS). PCIe devices are organized in a tree-like hierarchy, with each node connected via PCIe links. For a device to be accessible, all links between the root port and the endpoint device must be successfully trained and active. While link training is managed by hardware, failure at this stage results in the unavailability of all downstream devices. Tools such as lspci
can be used to verify the status of downstream links for root ports and switches.
In the following example, the LnkSta
line shows the link operating correctly. TrErr-
signifies there were no training errors and DLActive+
signifies the link is up.
# lspci -vv -s 5
:0.0
05
:00.0
PCI bridge: Mellanox Technologies MT43244 Family [BlueField-3
SoC PCIe Bridge] (rev 01
) (prog-if
00
[Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin ? routed to IRQ 57
IOMMU group: 5
Bus: primary=05
, secondary=06
, subordinate=06
, sec-latency=0
I/O behind bridge: 00000000
-00000fff [size=4K]
Memory behind bridge: 00200000
-003fffff [size=2M]
Prefetchable memory behind bridge: 0000800005000000
-00008000051fffff [size=2M]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [60
] Express (v2) Downstream Port (Slot+), MSI 00
DevCap: MaxPayload 512
bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128
bytes, MaxReadReq 128
bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #1
, Speed 32GT/s, Width x2, ASPM not supported
ClockPM- Surprise+ LLActRep+ BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x2 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
Enumeration
The next stage is PCIe enumeration, a process by which software discovers all devices present in the PCIe fabric. This is accomplished by reading the first register of every possible device address to determine which devices respond. The first register contains the vendor ID and device ID, which uniquely identify the device. PCIe enumeration occurs twice during boot: once by UEFI and then again by Linux.
Devices detected during Linux PCIe enumeration are listed by lspci
. If a device appears here, it indicates that the device is present in the system and has responded correctly to a configuration read. However, this does not guarantee the functionality of the device or its associated driver.
Resource Allocation
After enumeration, the OS performs PCIe resource allocation. If resource allocation fails, some devices may become unavailable to the OS. There are three types of PCIe resources:
I/O space
Bus numbers
Memory space
The BlueField platform does not support PCIe I/O space, leaving bus numbers and memory space as the primary considerations. The system supports 255 buses, making exhaustion of this resource unlikely. However, insufficient PCIe memory space can lead to errors, which are often logged in dmesg
.
[ 0.781698
] pci 0000
:21
:00.0
: BAR 6
: no space for
[mem size 0x00100000
pref]
[ 0.781700
] pci 0000
:21
:00.0
: BAR 6
: failed to assign [mem size 0x00100000
pref]
[ 0.781703
] pci 0000
:21
:00.1
: BAR 6
: no space for
[mem size 0x00100000
pref]
[ 0.781705
] pci 0000
:21
:00.1
: BAR 6
: failed to assign [mem size 0x00100000
pref]
There are two types of PCIe memory space:
32-bit memory space – BlueField-3 supports 2 GB, ranging from
0x7FFF_0000_0000
to0x7FFF_7FFF_FFFF
.64-bit memory space – BlueField-3 supports 128 TB, ranging from
0x8000_0000_0000
to0xFFFF_FFFF_FFFF
.
Despite the large capacity of 64-bit memory space, exhaustion is still possible. This can occur due to devices that support a limited number of address bits or alignment requirements, which may leave large chunks of address space unusable. If memory space allocation fails, reviewing /proc/iomem
can be helpful. This file lists all available memory ranges and their allocation status for each device.
Depending on the Linux configuration, it may either retain the resource allocation performed by UEFI or reallocate resources independently. This behavior can be controlled using the kernel command-line options pci=realloc=on
or pci=realloc=off
.
Device Drivers
If enumeration and resource allocation succeed but the device services are still not available, then the issue is likely with the driver. If lspci -v
shows a line labeled Kernel driver in use
or Kernel modules
, then the device driver is successfully attached to that device. In the following example, it is the NVMe driver:
# lspci -v -s 6
:0.0
06
:00.0
Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less) (prog-if
02
[NVM Express])
Subsystem: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less)
Physical Slot: 0
Flags: bus master, fast devsel, latency 0
, IRQ 61
, IOMMU group 6
Memory at 7fff00200000 (64
-bit, non-prefetchable) [size=16K]
Capabilities: [40
] Express Endpoint, MSI 00
Capabilities: [80
] Power Management version 3
Capabilities: [90
] MSI: Enable- Count=1
/32
Maskable+ 64bit+
Capabilities: [b0] MSI-X: Enable+ Count=32
Masked-
Capabilities: [100
] Advanced Error Reporting
Capabilities: [150
] Virtual Channel
Capabilities: [260
] Latency Tolerance Reporting
Capabilities: [300
] Secondary PCI Express
Capabilities: [400
] L1 PM Substates
Kernel driver in use: nvme
Kernel modules: nvme
If that line is missing, then the driver is either missing or the attachment failed. In either case, searching for the name of the driver in the dmesg
output should provide more information.
UEFI Enumeration
If debugging from Linux is difficult or not available, the UEFI Internal Shell can be used to see the results of PCIe enumeration as done by UEFI. To enter the shell, press Esc
on the console when UEFI starts to boot. From the menu, select Boot Manager and then scroll down to EFI Internal Shell. The relevant commands are pci
, devices
, and drivers
. The help
command will provide usage information for each command.
Example pci
command output:
Shell> pci
Seg Bus Dev Func
--- --- --- ----
00
00
00
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device A2DA Prog Interface 0
00
01
00
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device 197B Prog Interface 0
00
02
00
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device 197B Prog Interface 0
00
02
03
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device 197B Prog Interface 0
00
03
00
00
==> Network Controller - Ethernet controller
Vendor 15B3 Device A2DC Prog Interface 0
00
03
00
01
==> Network Controller - Ethernet controller
Vendor 15B3 Device A2DC Prog Interface 0
00
04
00
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device 197B Prog Interface 0
00
05
00
00
==> Bridge Device - PCI/PCI bridge
Vendor 15B3 Device 197B Prog Interface 0
00
06
00
00
==> Mass Storage Controller - Non-volatile
memory subsystem
Vendor 1E0F Device 0001
Prog Interface 2
Missing PCIe Devices
If running lspci
on the BlueField produces no output and all PCIe devices are missing, this indicates that the device is in Livefish mode. In this case, the NIC firmware must be reinstalled.
Insufficient Power on the PCIe Slot
If you see the error Insufficient power on the PCIe slot
in dmesg
, consult the "Specifications" page of your BlueField device's hardware user guide to ensure that it is receiving the appropriate power supply.
To check the power capacity of your host's PCIe slots, execute the command lspci -vvv | grep PowerLimit
. For instance:
# lspci -vvv | grep PowerLimit
Slot #6, PowerLimit 75.000W; Interlock- NoCompl-
Slot #1, PowerLimit 75.000W; Interlock- NoCompl-
Slot #4, PowerLimit 75.000W; Interlock- NoCompl-
This command is not supported by all host vendors or types.
Obtaining the Complete PCIe Device Description
The lspci
command may not display the complete descriptions of NVIDIA PCIe devices connected to the host system. For example:
# lspci | grep -i Mellanox
a3:00.0 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01)
a3:00.1 Infiniband controller: Mellanox Technologies Device a2d6 (rev 01)
a3:00.2 DMA controller: Mellanox Technologies Device c2d3 (rev 01)
To obtain the full descriptions of these devices, run:
# update-pciids
Once the PCIe device ID database has been updated, the lspci
command should display detailed information for each device. For example:
# lspci | grep -i Mellanox
a3:00.0 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
a3:00.1 Infiniband controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
a3:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)
Managing Two BlueField Platforms in the Same Server
This example demonstrates the procedure for managing two BlueField platforms installed in the same server. The process is similar when managing additional platforms.
This example assumes that the RShim package is already installed on the host server.
Configuring Management Interface on Host
This example applies only to CentOS and RHEL operating systems.
Create a
br_tmfifo
interface configuration file. Run:vim /etc/sysconfig/network-scripts/ifcfg-br_tmfifo
Add the following content to the file:
DEVICE=
"br_tmfifo"
BOOTPROTO="static"
IPADDR="192.168.100.1"
NETMASK="255.255.255.0"
ONBOOT="yes"
TYPE="Bridge"
Create a configuration file for the first BlueField platform (
tmfifo_net0
). Run:vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0
Add the following content to the file:
DEVICE=tmfifo_net0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo
Create a configuration file for the second BlueField platform (
tmfifo_net1
). Run:vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net1
Add the following content to the file:
DEVICE=tmfifo_net1 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BRIDGE=br_tmfifo
Define rules for the
tmfifo_net
interfaces: Run:vim /etc/udev/rules.d/91-tmfifo_net.rules
Restart the network to apply the changes. Run:
# /etc/init.d/network restart
Expected output:
Restarting network (via systemctl): [ OK ]
Configuring BlueField Platform Side
BlueField platforms are shipped with the following factory default configurations for tmfifo_net0
.
Address | Value |
MAC |
|
IP |
|
If more than one BlueField platform is in use, the default MAC and IP addresses must be modified.
Updating the RShim Network MAC Address
This procedure applies to Ubuntu/Debian (with sudo
), and CentOS BFB installations. It only affects tmfifo_net0
on the Arm side.
Use a Linux console application (e.g.,
screen
orminicom
) to log into each BlueField platform. For example:# sudo screen /dev/rshim<0|1>/console 115200
Create a configuration file for the
tmfifo_net0
MAC address:# sudo vi /etc/bf.cfg
Insert the new MAC address into the
bf.cfg
file:NET_RSHIM_MAC=00:1a:ca:ff:ff:03
Apply the new MAC address:
sudo bfcfg
Repeat this process for the second BlueField platform, ensuring each one uses a unique MAC address.
InfoThe Arm processor must be rebooted for the changes to take effect. To avoid unnecessary reboots, it is recommended to update the IP address before restarting the Arm.
For a
comprehensive list of the supported parameters to customize
bf.cfg
during BFB installation, refer to the
"bf.cfg Parameters" section in the "Customizing BlueField Software Deployment Using bf.cfg" page.
Updating an IP Address
For Ubuntu:
Edit the
50-cloud-init.yaml
file to update thetmfifo_net0
IP address:sudo
vim /etc/netplan/50-cloud-init.yamlModify the entry as follows:
tmfifo_net0: addresses: -
192.168
.100.2
/30
# Change to: -192.168
.100.3
/30
Reboot the Arm. Run:
sudo
rebootRepeat this process for the second BlueField platform, ensuring each one has a unique IP address.
InfoThe Arm processor must be rebooted for the changes to take effect. It is recommended to update the MAC address before restarting the Arm to minimize reboots.
For CentOS:
Edit the
ifcfg-tmfifo_net0
file:# vim /etc/sysconfig/network-scripts/ifcfg-tmfifo_net0
Update the
IPADDR
field:IPADDR=192.168.100.3
Reboot the Arm processor or apply the changes:
reboot
Alternatively, use
netplan apply
.Repeat this process for the second BlueField DPU, ensuring a unique IP address is assigned.
InfoThe Arm processor must be rebooted for the changes to take effect. It is recommended to update the MAC address before restarting the Arm to minimize reboots.