mlxfwstress Utility
The tool can support new devices only once the tool is upgraded to its latest version.
mlxfwstress enables/disables various firmware stress flows. It can work in multiple modes:
Enable/disable a specific set of stress types
Clear all stress types
Random mode:
Single mode - choose one stress type in each iteration and enable/disable it
Wild-mode- choose multiple stress types in each iteration and enable/disable them
Each time a stress type is chosen in a random iteration, the opposite operation is done on it (e.g., if a stress type is turned on, in the next iteration it will be turned off and vice versa).
Toggle mode:
Turns on and off the list of stress types alternating. Can be used with iterations.
NoteTo disable a stressor while in toggling mode, first you must disable the mlxfwstress tool, and only after that disable the stressor.
Clear semaphore:
Note: This functionality is supported in ConnectX-3 Pro adapter cards only.
# mlxfwstress [-d|--dev <DeviceName>] [-h|--help] [-v|--version] [-o|--operation <Operation>] [--rand-mode <Random mode>] [-t|--stress-type <Stress type>] [--iterations <Iterations>] [--stress-delay <Stress delay>] [--max-rand-on <Max rand on>] [--hang-type <Hang type>] [--seed <seed>] [--toggle-time <x,y>]
where:
-d|--dev <DeviceName> |
Perform operation for a specified device |
-h|--help |
Show this message and exit |
-v|--version |
Show the executable version and exit |
-o|--operation <Operation> |
Choose operation: on, off, clear_all, random query, clear_semaphore |
--rand-mode <Random mode> |
Choose a random mode: single, wild |
-t|--stress-type <Stress type> |
Specify a list of stress types separated by comma. (See Stress Types.) |
--iterations <Iterations> |
Specify the number of iterations. |
--stress-delay <Stress delay> |
Specify the stress delay in seconds (can be float). Note: Some stress flows may take more time. Recommended values: 0-1 |
--max-rand-on <Max rand on> |
Specify the maximal time a stress is allowed to be on in random mode in seconds. Recommended values (0,1] Default is 1 |
--hang-type <Hang type> |
Specify a list of hang types separated by comma. (See Hang Types.) |
--seed <seed> |
Specify the seed for the random. |
--toggle-time <x,y> |
Toggle time after off, both in seconds (can be float). If y is not supplied the tool will use equal values for x and y |
ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Stress Types
The following are the stress types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:
Category |
Stress Type |
Description |
Notes |
Transparent |
PAUSE_STORM_GENERATION |
Generates pause frames from the device toward the network |
|
INVALIDATE_INTERNAL_CACHE_RX_1 |
Invalidates STE cache |
||
INVALIDATE_INTERNAL_CACHE_RX_2 |
Invalidates qp L0 cache (RX) |
||
INVALIDATE_INTERNAL_CACHE_RX_3 |
Invalidates dct L0 cache (RX) |
||
INVALIDATE_INTERNAL_CACHE_RX_4 |
Invalidates scatter list cache in RX |
||
INVALIDATE_INTERNAL_CACHE_CQ |
Invalidates CQC cache |
||
INVALIDATE_INTERNAL_CACHE_SX1 |
Invalidates SXDC cache |
||
INVALIDATE_INTERNAL_CACHE_RX_5 |
Invalidates LDB cache |
||
INVALIDATE_INTERNAL_CACHE_GENERAL_1 |
Invalidates RO caches |
||
INVALIDATE_INTERNAL_CACHE_SX2 |
Invalidates pkey cache (SX) |
||
INVALIDATE_INTERNAL_CACHE_SX3 |
Invalidates guid cache (SX) |
||
INVALIDATE_INTERNAL_CACHE_QP |
Invalidates QPC (main QP cache unit) |
||
Hang FW/HW |
PACKET_DROP |
Drops N packets on portx |
This type requires the following extra flags:
|
ConnectX-3 Pro Adapter Cards Stress Types
The following are the stress types available for ConnectX-3 Pro adapter cards:
Stressors in "Transparent" category that are active for more than 100 msec, may cause resiliency.
Category |
Stress Type |
Description |
Transparent |
STOP_CE_INSTAGE_EQE |
Stops sending EQEs created by the hardware (not the ones created by the firmware). |
STOP_EDBH |
Stops the handling of external doorbells. |
|
STOP_IDBH |
Stops the handling of internal doorbells. |
|
STOP_QPC_MISS_MACHINE_0 STOP_QPC_MISS_MACHINE_1 STOP_QPC_MISS_MACHINE_2 STOP_QPC_MISS_MACHINE_3 |
Spots reading a QPC from the ICM on a miss-blocking hardware/firmware that accesses the QPC |
|
LOCK_CEGW |
Locks the CQE gateway. |
|
LOCK_OBGW_TPT LOCK_OBGW_TCU LOCK_OBGW_SXD |
Locks the OBGW (access to the host memory gateway). |
|
LOCK_QPCGW_RX |
Locks QPCGW. |
|
LOCK_SEMAPHORE_IPC_RX0 LOCK_SEMAPHORE_IPC_RX1 LOCK_SEMAPHORE_IPC_LDB LOCK_SEMAPHORE_IPC_SX1 |
Locks the IPC semaphore. |
|
INVALIDATE_CACHES |
Invalidates caches. |
|
Performance |
STOP_SXP_VL_ARB_PORT1 STOP_SXP_VL_ARB_PORT2 |
Stops transmission of packets to the wire. Causes head-of-line packet drop (HLL) if enabled. |
RX_BACKPRESSURE |
Stops the RX pipe - back-pressure to wire- sending tx pauses. |
|
DROP_PACKETS_TX |
Drops packets on the TX side. |
Turning On Stress Types
To turn on a specific stress type:
mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED
To turn on a set of stress types:
mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED
Turning ON stress type: stop_qpc_miss_machine_3 -PASSED
Turning ON stress type: lock_semaphore_ipc_rx1 -PASSED
To turn on all the available stress types:
mlxfwstress -d mt4119_pciconf0 -t ALL -o on
Random seed: [1587969653
]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_CQ -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_QP -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_2 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_3 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_4 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_5 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX2 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX3 -PASSED
Turning Off Stress Types
To turn off a specific stress type:
mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED
To turn off a set of stress types:
mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Querying the Stress Types
To query the state of all stress types:
mlxfwstress -d mt4117_pciconf0 -o query -t ALL
-------------------------------------------------
Operation: [QUERY]
-------------------------------------------------
Querying stress type: INVALIDATE_INTERNAL_CACHE_CQ -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_QP -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_2 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_3 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_4 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_5 -NOT SUPPORTED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX2 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX3 -ENABLED
ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Hang Types
The following are the hang types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:
Category |
Stress Type |
Description |
Notes |
Hang FW/HW |
FFSER |
Initialize FaultInjector object |
|
STOP_RX_PER_PRIO1 |
This type requires the following extra flags:
|
mlxfwstress -d mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra %STOP_RX_PER_PRIO[0x00100FF
]
Random seed: [1588056318
]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO -PASSED
To turn this Hang Type, the command must be executed in the following format:
Example:
mlxfwstress -d mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra % STOP_RX_PER_PRIO [0x000100FF
]
output:
Random seed: [1573642282
]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO-PASSED
Turning On Hang Types
To turn on a specific hang type:
mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED
To turn on a set of hang types:
mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED
Turning ON hang type: Rx1 -PASSED
Turning Off Hang Types
To turn off a specific hang type:
mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
To turn off a set of hang types:
mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Querying the Hang Types
To query the state of all hang types:
mlxfwstress -d mt4103_pciconf0 -o query --hang-type ALL
-------------------------------------------------
Operation: [QUERY]
-------------------------------------------------
Querying hang type: Sx1 -ENABLED
Querying hang type: Rx1 -ENABLED
Querying hang type: Tx -ENABLED
Querying hang type: Rx -ENABLED
To clear all stress/hang types:
mlxfwstress - d mt4103_pciconf0 -o clear_all
-------------------------------------------------
Operation: [CLEAR_ALL]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Turning OFF hang type: Tx -PASSED
Turning OFF hang type: Rx -PASSED
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port1 -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port2 -PASSED
Turning OFF stress type: stop_edbh -PASSED
Turning OFF stress type: stop_idbh -PASSED
Turning OFF stress type: stop_qpc_miss_machine_0 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_1 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_2 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_cegw -PASSED
Turning OFF stress type: lock_obgw_tpt -PASSED
Turning OFF stress type: lock_obgw_tcu -PASSED
Turning OFF stress type: lock_obgw_sxd -PASSED
Turning OFF stress type: lock_qpcgw_rx -PASSED
Turning OFF stress type: lock_semaphore_ipc_sx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx0 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_ldb -PASSED
Turning OFF stress type: invalidate_caches -PASSED
To clear the semaphore:
mlxfwstress -d mt4103_pciconf0 -o clear_semaphore
-------------------------------------------------
Operation: [CLEAR_SEMAPHORE]
-------------------------------------------------
Semaphore was cleared successfully
There are two random modes you can choose from:
Single - gives a set of stress types, in each iteration one stress type is chosen an toggled ON/OFF according to his current state
Wild - gives a set of stress types, in each iteration a random subset of stress types is chosen and toggled ON/OFF according to their current state
Setting the Random Mode for the Stress Types
To set the Single Mode:
mlxfwstress -d mt4103_pciconf0 -o random --rand-mode single -t STOP_CE_INSTAGE_EQE --stress-delay 0.2
--iterations 10
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay: 0.2
[sec]
Iterations number: 10
Max on time: 1
[sec]
#############################################
RANDOM ITERATION: [1
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 0
[ms]
RANDOM ITERATION: [2
]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 200
[ms]
RANDOM ITERATION: [3
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 201
[ms]
RANDOM ITERATION: [4
]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 200
[ms]
RANDOM ITERATION: [5
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200
[ms]
RANDOM ITERATION: [6
]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 201
[ms]
RANDOM ITERATION: [7
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200
[ms]
RANDOM ITERATION: [8
]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 201
[ms]
RANDOM ITERATION: [9
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200
[ms]
Turning OFF stress type: stop_ce_instage_eqe
RANDOM ITERATION: [10
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200
[ms]
=======================================================
Turning off all stress types after random:
Turning OFF stress type: stop_ce_instage_eqe
As seen in the example above, after the specified number of iterations, the tool turns off all the stress types.
The default value for stress-delay is 1 second.
If no number of iterations was supplied then the user is expected to stop the tool with ctrl+c. The tool turns off all the stress types.
To set the Wild Mode:
mlxfwstress -d mt4103_pciconf0 -o random --rand-mode wild -t ALL --stress-delay 0.2
--max-rand-on 1
--iterations 5
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay: 0.2
[sec]
Iterations number: 5
Max on time: 1
[sec]
#############################################
RANDOM ITERATION: [1
]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 0
[ms]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 0
[ms]
[stop_edbh]: [ON] , duration since last operation: 0
[ms]
[stop_idbh]: [ON] , duration since last operation: 0
[ms]
[stop_qpc_miss_machine_0]: [ON] , duration since last operation: 0
[ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation: 0
[ms]
[lock_cegw]: [ON] , duration since last operation: 0
[ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 0
[ms]
[lock_qpcgw_rx]: [ON] , duration since last operation: 0
[ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation: 0
[ms]
RANDOM ITERATION: [2
]
[stop_sxp_vl_arb_port1]: [ON] , duration since last operation: 0
[ms]
[stop_edbh]: [OFF], duration since last operation: 203
[ms]
[stop_idbh]: [OFF], duration since last operation: 203
[ms]
[stop_qpc_miss_machine_3]: [OFF], duration since last operation: 202
[ms]
[lock_cegw]: [OFF], duration since last operation: 202
[ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 0
[ms]
[lock_obgw_tcu]: [OFF], duration since last operation: 203
[ms]
[lock_semaphore_ipc_rx0]: [ON] , duration since last operation: 0
[ms]
[lock_semaphore_ipc_rx1]: [ON] , duration since last operation: 0
[ms]
[lock_semaphore_ipc_ldb]: [ON] , duration since last operation: 0
[ms]
RANDOM ITERATION: [3
]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 406
[ms]
[stop_sxp_vl_arb_port2]: [OFF], duration since last operation: 406
[ms]
[stop_edbh]: [ON] , duration since last operation: 203
[ms]
[stop_idbh]: [ON] , duration since last operation: 203
[ms]
[stop_qpc_miss_machine_0]: [OFF], duration since last operation: 406
[ms]
[stop_qpc_miss_machine_2]: [ON] , duration since last operation: 0
[ms]
[lock_obgw_tpt]: [OFF], duration since last operation: 203
[ms]
[lock_obgw_sxd]: [ON] , duration since last operation: 0
[ms]
[lock_semaphore_ipc_sx1]: [OFF], duration since last operation: 405
[ms]
[lock_semaphore_ipc_ldb]: [OFF], duration since last operation: 203
[ms]
RANDOM ITERATION: [4
]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 203
[ms]
[stop_edbh]: [OFF], duration since last operation: 202
[ms]
[stop_idbh]: [OFF], duration since last operation: 202
[ms]
[stop_qpc_miss_machine_1]: [ON] , duration since last operation: 0
[ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation: 406
[ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 202
[ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 406
[ms]
[lock_obgw_sxd]: [OFF], duration since last operation: 203
[ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation: 203
[ms]
[lock_semaphore_ipc_rx1]: [OFF], duration since last operation: 406
[ms]
[invalidate_caches]: [ON] , duration since last operation: 0
[ms]
Turning OFF stress type: stop_sxp_vl_arb_port1
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_qpc_miss_machine_1
Turning OFF stress type: stop_qpc_miss_machine_2
Turning OFF stress type: stop_qpc_miss_machine_3
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: lock_semaphore_ipc_sx1
Turning OFF stress type: lock_semaphore_ipc_rx0
Turning OFF stress type: invalidate_caches
RANDOM ITERATION: [5
]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 202
[ms]
[stop_idbh]: [ON] , duration since last operation: 322
[ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 202
[ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 202
[ms]
[lock_qpcgw_rx]: [ON] , duration since last operation: 202
[ms]
[invalidate_caches]: [ON] , duration since last operation: 202
[ms]
=======================================================
Turning off all stress types after random:
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_idbh
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: invalidate_caches
ConnectX-3/ConnectX-3 Pro Adapter Cards Hang Types
The following are the hang types available for ConnectX-3/ConnectX-3 Pro adapter cards:
Category |
Stress Type |
Description |
Notes |
Hang FW/HW |
HANG_SX1 |
||
HANG_RX1 |
|||
HANG_TX |
|||
HANG_RX |
|||
ALL |
Hang types that require extra flags are not supported when running with the 'ALL' option. |