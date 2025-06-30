For PCIe link speed and width, use the following flag: --port_type PCIE .

Copy Copied! PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : [Depth, pcie index, node] Link Speed Active (Enabled) : [Freq – Gen] Link Width Active (Enabled) : [Width]





When using NVIDIA ConnectX-5 and newer devices, the PCIe interface can be configured for a PCIe switch. When the PCIe switch is enabled, the depth, pcie_index and node parameters are needed in order to specify the PCIe port from which the requested information (such as counters or eye info) is gathered.

Parameters Description Depth This parameter defines the number of layers from the Root Complex to the specific port. For NVIDIA ConnectX adapter cards multi-host mode, the depth should be set to 0.

For NVIDIA BlueField/BlueField-2 JBoF, the depth should be set to 3. Pcie_index This parameter defines the root complex ID or host index. For NVIDIA ConnectX adapter cards multi-host mode, the pcie_index is the host index (0–3).

For NVIDIA BlueField/BlueField-2 JBoF, the pcie_index is always 0. Node This parameter defines the specific PCIe port. For NVIDIA ConnectX adapter cards multi-host mode, the node is always 0 for each host_index.

For NVIDIA BlueField JBoF mode, this parameter range is 0x0–0xF, which amounts for up to 16 possible ports for BlueField JBoF.

For NVIDIA BlueField-2, this parameter's range is 0x0–0x7. Note: For NVIDIA BlueField/BlueField-2 SmartNIC mode, the PCIe link information can only be gathered from the external host. The PCIe interface status cannot be retrieved from the Arm side. When retrieving the PCIe link information from the external host, there is no need to specify the depth, pcie_index and node.

Example: NVIDIA BlueField JBoF Mode

Copy Copied! # mlxlink -d /dev/mst/mt41682_pciconf0 --port_type pcie --depth 3 --pcie_index 0 --node 4 -c PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : 3 , 0 , 4 Link Speed Active (Enabled) : 8G-Gen 3 (16G-Gen 4 ) Link Width Active (Enabled) : 2X (2X) Management PCIe Timers Counters Info ------------------------------------ dl down : 0 Management PCIe Performance Counters Info ----------------------------------------- RX Errors : 0 TX Errors : 0 CRC Error dllp : 0 CRC Error tlp : 0





For PCIe counters information, use the --port_type PCIE –c flag.

Copy Copied! Management PCIe Timers Counters Info ------------------------------------ dl down : [link down counter] Management PCIe Performance Counters Info ----------------------------------------- RX Errors : [Rx Errors] TX Errors : [Tx Errors] CRC Error dllp : [CRC Errors dllp] CRC Error tlp : [CRC Errors tlp]

RX Errors: indicate the number of transitions to recovery required due to framing errors and CRC (dlp and tlp) errors.

TX Errors: indicate the number of transitions to recovery required due to EIEOS and TS errors.

CRC Error dllp: indicate CRC error in Data Link Layer Packets.

CRC Error tlp: indicate CRC error in Transaction Layer Packet.

Example:

Copy Copied! # mlxlink -d /dev/mst/mt4123_pciconf0 --port_type PCIE -c PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : 0 , 0 , 0 Link Speed Active (Enabled) : 16G-Gen 4 (16G-Gen 4 ) Link Width Active (Enabled) : 16X (16X) Management PCIe Timers Counters Info ------------------------------------ dl down : 3 Management PCIe Performance Counters Info ----------------------------------------- RX Errors : 0 TX Errors : 16 CRC Error dllp : 0 CRC Error tlp : 0





For PCIe link physical grade and eye opening information, use the --port_type PCIE –e flag.

Copy Copied! EYE Opening Info (PCIE) ------------------------ Physical Grade : [Grade0, Grade1, Grade2, Grade3, Grade4, Grade5, Grade6, Grade7, Grade8, Grade9, Grade10, Grade11, Grade12, Grade13, Grade14, Grade15] Height Eye Opening [mV] : [Height0, Height1, Height2, Height3, Height4, Height5, Height6, Height7, Height8, Height9, Height10, Height11, Height12, Height13, Height14, Height15] Phase Eye Opening [psec] : [Phase0, Phase1, Phase2, Phase3, Phase4, Phase5, Phase6, Phase7, Phase8, Phase9, Phase10, Phase11, Phase12, Phase13, Phase14, Phase15]

Example:

Copy Copied! # mlxlink -d /dev/mst/mt4123_pciconf0 --port_type PCIE -e PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : 0 , 0 , 0 Link Speed Active (Enabled) : 16G-Gen 4 (16G-Gen 4 ) Link Width Active (Enabled) : 16X (16X) EYE Opening Info (PCIe) ----------------------- Physical Grade : 57279 , 56340 , 59340 , 61824 , 55140 , 60501 , 61530 , 57392 , 61573 , 58930 , 62752 , 60421 , 57188 , 59796 , 60066 , 60847 Height Eye Opening [mV] : 292 , 288 , 314 , 325 , 278 , 310 , 319 , 299 , 316 , 318 , 343 , 323 , 310 , 311 , 335 , 318 Phase Eye Opening [psec] : 30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 28 , 28 , 28 , 28 , 30 , 28 , 30





Copy Copied! mlxlink -d [device] --port_type PCIE --margin

Gen3

Gen3 Eye Grade Figure of Merit (FOM) 0 < Eye Grade < 700 FAIL 700 < Eye Grade < 2300 Gray area 2300 < Eye Grade PASS

Gen4

Gen4 Eye Margin FOM 0 < Eye Grade < 150 FAIL 150 < Eye Grade < 400 Gray area 400 < Eye Grade PASS

This test feature allows errors injection over the PCI links. It is used to verify that the system can handle the PCIe errors, which rarely occur in regular usage.

The ConnectX-7 device includes testability features that can be configured to act as an error injection ‘exerciser’ in order to test other components in the system. This is supported when the ConnectX-7 is used as a PCIe switch.

Note This is a PCIe related feature that should be run over PCIe links only ( --port_type PCIE ) with specific depth, PCIe index and node (DPN).

If the DPN is not provided, the tool will take the default values - 0,0 and 0, respectively.

The mapping between the BDF and its DPN can be found by executing the show_links command (see example below).

ID Error Type Description Unit Additional Parameters ( --errors_parameters ) Advanced Error Reporting Flag Set by This Error 0 ABORT Cancels the current pending error, if exists. NA NA NA 1 BAD_DLLP_LCRC Flips a bit in the LCRC of the next “ error_duration ” DLLPs that are transmitted through the port. Packets NA Bad DLLP Status 2 BAD_TLP_LCRC Flips a bit in the LCRC of the next “ error_duration ” TLPs that are transmitted through the port. The packets are VDM TLPs that are sent by the port to the destination BDF - “ dbdf ”. Packets NA Bad TLP Status 3 BAD_TLP_ECRC Flips a bit in the ECRC of the next “error_duration” TLPs that are transmitted through the port. The packets are VDM TLPs that are sent by the port to the destination BDF - “dbdf”. Packets NA ECRC Error Status 4 ERR_MSG Sends an error signaling message to the RC. Packets Parameter 0: message type 0 - Correctable 1 - Nonfatal 2 - Fatal ERR_COR Received / Non-Fatal Error Messages Received / Fatal Error Messages Received 5 MALFORMED_ TLP Sends an “error_duration” PM_ACTIVE_STATE_NACK message to the destination BDF - “dbdf” with TC=1 instead of 0. Packets NA Malformed TLP Status 6 POISONED_TLP Sends an “error_duration” VDMs with data to the destination BDF - “dbdf” with EP = 1. Packets NA Poisoned TLP Received 7 UNEXPECTED_CPL Sends “error_duration” completions to the destination BDF - “dbdf” with 0xff tag. Packets NA Unexpected Completion Status 8 ACS_VIOLATION Sends “error_duration” VDMs to the destination BDF - “dbdf” with source_bdf=0. Packets NA ACS Violation Status 100 SURPRISE_LINK_DOWN Sets a port state to DETECT. NA NA Surprise Down Error Status 101 RECEIVER_ERROR Sends a clock instead of data for “error_duration” usecs. A value of 0 in ‘error_duration’ means that this error must be toggled by the firmware as fast as possible. uSec NA Receiver Error Status

The following values should be provided in the error injection command line. Some values may be optional according to the error type.

Input Command Line Flag Description Obligatory Default Error Type --error_type Error type according to the table above. Yes - Error Duration --duration The minimal number of packets with this error that will be sent, or the minimal amount of time that this error state would be applied. No 1 Injection Delay --injection_delay Delay in microseconds before the error is applied. This allows time for the completion to return to the tool caller correctly. A higher value can be used to allow the system to get to a lower power state. No 0 Destination BDF --dbdf Destination BDF. Relevant for some of the errors that require packet generation. See error table above. No 0:00.0 Additional Parameters --errors_parameters Additional parameters according to the error type. See error table above. No 0

Mlxlink will trigger the firmware to start the error injection process by providing the --pcie_error_injection flag with the requested configuration parameters.

Note that the command returns immediately, but the error injection can take longer to complete (according to the error duration and injection delay inputs).

When the tool is run without the parameters above, it will query the error injection state – Whether it is ready to start a new error injection, or it is in the middle of the previous injection.

Start the process by performing error injection with error type UNEXPECTED_CPL.

This example shows how to start the error injection by sending 5 unexpected completion packets. The packets (of error type UNEXPECTED_CPL (id 7)) are directed from BDF 05:00.0 to BDF 06:0.0 after 500µs of sending the command in the following environment:

PCIe Component BDF Root port 00:01.0 (exerciser) PCIe Switch Upstream port 01:00.0 (exerciser) PCIe Switch Downstream port 05:00.0 Endpoint 06:00.0

To get the related depth, pcie_index and node for the specific BDF 05:00.0, the show_links command should be executed as follows:

Show Links Collapse Source Copy Copied! mlxlink -d /dev/mst/mt4129_pciconf0 --port_type PCIE --show_links Valid PCIe Links ---------------- Legend : depth ,pcie_index ,node ,port ,bdf Link 1 : 0 ,0 ,0 ,0 ,01:00.0 Link 2 : 3 ,0 ,0 ,60 ,05:00.0

In this case, the depth, pcie_index, and node flags for the downstream port should be 3, 0, and 0, respectively.

Then, the following command can be executed to start the process:

Start PCIe Error Injection Collapse Source Copy Copied! mlxlink -d /dev/mst/mt4129_pciconf0 --port_type PCIE --depth 3 --pcie_index 0 --node 0 --pcie_error_injection --error_type UNEXPECTED_CPL --error_duration 5 --dbdf 06:00.0 --injection_delay 500 PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : 3, 0, 0 Link Speed Active (Enabled) : 16G-Gen 4 (16G-Gen 4) Link Width Active (Enabled) : 16X (16X) PCIe error injection might cause a PCIe bus failures or a system hang Do you want to continue ? yes Starting PCIe Error Injection...

After sending the configuration command, the progress of the process can be checked by executing the tool with the pcie_error_injection flag only:

Query PCIe Error Injection Status 1 Collapse Source Copy Copied! mlxlink -d /dev/mst/mt4129_pciconf0 –-port_type PCIE --depth 3 --pcie_index 0 --node 0 –-pcie_error_injection PCIe Operational (Enabled) Info ------------------------------- Depth, pcie index, node : 3, 0, 0 Link Speed Active (Enabled) : 16G-Gen 4 (16G-Gen 4) Link Width Active (Enabled) : 16X (16X) PCIe Error Injection Info ------------------------- Error Injection Status : In progress Error Injection Type : UNEXPECTED_CPL Error Injection Duration : 5 Packets

Once the process is complete, the output will be changed to "ready". This means that another error injection request can be submitted: