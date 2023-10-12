On This Page
- mlxfwstress Synopsis
- Stress Types
- Hang Types
- Clearing all Stress/Hang Types
- Clearing the Semaphore
- Random Operation
mlxfwstress Utility
The tool is currently supported on Windows platforms only.
The tool can support new devices only once the tool is upgraded to its latest version.
mlxfwstress enables/disables various firmware stress flows. It can work in multiple modes:
Enable/disable a specific set of stress types
Clear all stress types
Random mode:
Single mode - choose one stress type in each iteration and enable/disable it
Wild-mode- choose multiple stress types in each iteration and enable/disable them
Each time a stress type is chosen in a random iteration, the opposite operation is done on it (e.g., if a stress type is turned on, in the next iteration it will be turned off and vice versa).
Toggle mode:
Turns on and off the list of stress types alternating. Can be used with iterations.Warning
To disable a stressor while in toggling mode, first you must disable the mlxfwstress tool, and only after that disable the stressor.
Clear semaphore:
Note: This functionality is supported in ConnectX-3 Pro adapter cards only.
# mlxfwstress [-d|--dev <DeviceName>] [-h|--help] [-v|--version] [-o|--operation <Operation>] [--rand-mode <Random mode>] [-t|--stress-type <Stress type>] [--iterations <Iterations>] [--stress-delay <Stress delay>] [--max-rand-on <Max rand on>] [--hang-type <Hang type>] [--seed <seed>] [--toggle-time <x,y>]
where:
|
-d|--dev <DeviceName>
|
Perform operation for a specified device
|
-h|--help
|
Show this message and exit
|
-v|--version
|
Show the executable version and exit
|
-o|--operation <Operation>
|
Choose operation: on, off, clear_all, random query, clear_semaphore
|
--rand-mode <Random mode>
|
Choose a random mode: single, wild
|
-t|--stress-type <Stress type>
|
Specify a list of stress types separated by comma. (See Stress Types.)
|
--iterations <Iterations>
|
Specify the number of iterations.
|
--stress-delay <Stress delay>
|
Specify the stress delay in seconds (can be float).
Note: Some stress flows may take more time.
Recommended values: 0-1
|
--max-rand-on <Max rand on>
|
Specify the maximal time a stress is allowed to be on in random mode in seconds.
Recommended values (0,1]
Default is 1
|
--hang-type <Hang type>
|
Specify a list of hang types separated by comma. (See Hang Types.)
|
--seed <seed>
|
Specify the seed for the random.
|
--toggle-time <x,y>
|
Toggle time after off, both in seconds (can be float). If y is not supplied the tool will use equal values for x and y
ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Stress Types
The following are the stress types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:
|
Category
|
Stress Type
|
Description
|
Notes
|
Transparent
|
PAUSE_STORM_GENERATION
|
Generates pause frames from the device toward the network
|
INVALIDATE_INTERNAL_CACHE_RX_1
|
Invalidates STE cache
|
INVALIDATE_INTERNAL_CACHE_RX_2
|
Invalidates qp L0 cache (RX)
|
INVALIDATE_INTERNAL_CACHE_RX_3
|
Invalidates dct L0 cache (RX)
|
INVALIDATE_INTERNAL_CACHE_RX_4
|
Invalidates scatter list cache in RX
|
INVALIDATE_INTERNAL_CACHE_CQ
|
Invalidates CQC cache
|
INVALIDATE_INTERNAL_CACHE_SX1
|
Invalidates SXDC cache
|
INVALIDATE_INTERNAL_CACHE_RX_5
|
Invalidates LDB cache
|
INVALIDATE_INTERNAL_CACHE_GENERAL_1
|
Invalidates RO caches
|
INVALIDATE_INTERNAL_CACHE_SX2
|
Invalidates pkey cache (SX)
|
INVALIDATE_INTERNAL_CACHE_SX3
|
Invalidates guid cache (SX)
|
INVALIDATE_INTERNAL_CACHE_QP
|
Invalidates QPC (main QP cache unit)
|
Hang FW/HW
|
PACKET_DROP
|
Drops N packets on portx
|
This type requires the following extra flags:
ConnectX-3 Pro Adapter Cards Stress Types
The following are the stress types available for ConnectX-3 Pro adapter cards:
Stressors in "Transparent" category that are active for more than 100 msec, may cause resiliency.
|
Category
|
Stress Type
|
Description
|
Transparent
|
STOP_CE_INSTAGE_EQE
|
Stops sending EQEs created by the hardware (not the ones created by the firmware).
|
STOP_EDBH
|
Stops the handling of external doorbells.
|
STOP_IDBH
|
Stops the handling of internal doorbells.
|
STOP_QPC_MISS_MACHINE_0
STOP_QPC_MISS_MACHINE_1
STOP_QPC_MISS_MACHINE_2
STOP_QPC_MISS_MACHINE_3
|
Spots reading a QPC from the ICM on a miss-blocking hardware/firmware that accesses the QPC
|
LOCK_CEGW
|
Locks the CQE gateway.
|
LOCK_OBGW_TPT
LOCK_OBGW_TCU
LOCK_OBGW_SXD
|
Locks the OBGW (access to the host memory gateway).
|
LOCK_QPCGW_RX
|
Locks QPCGW.
|
LOCK_SEMAPHORE_IPC_RX0
LOCK_SEMAPHORE_IPC_RX1
LOCK_SEMAPHORE_IPC_LDB
LOCK_SEMAPHORE_IPC_SX1
|
Locks the IPC semaphore.
|
INVALIDATE_CACHES
|
Invalidates caches.
|
Performance
|
STOP_SXP_VL_ARB_PORT1
STOP_SXP_VL_ARB_PORT2
|
Stops transmission of packets to the wire. Causes head-of-line packet drop (HLL) if enabled.
|
RX_BACKPRESSURE
|
Stops the RX pipe - back-pressure to wire- sending tx pauses.
|
DROP_PACKETS_TX
|
Drops packets on the TX side.
Turning On Stress Types
To turn on a specific stress type:
mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED
To turn on a set of stress types:
mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED
Turning ON stress type: stop_qpc_miss_machine_3 -PASSED
Turning ON stress type: lock_semaphore_ipc_rx1 -PASSED
To turn on all the available stress types:
mlxfwstress -d mt4119_pciconf0 -t ALL -o on
Random seed: [
1587969653]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_CQ -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_QP -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_2 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_3 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_4 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_5 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX1 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX2 -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX3 -PASSED
Turning Off Stress Types
To turn off a specific stress type:
mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED
To turn off a set of stress types:
mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Querying the Stress Types
To query the state of all stress types:
mlxfwstress -d mt4117_pciconf0 -o query -t ALL
-------------------------------------------------
Operation: [QUERY]
-------------------------------------------------
Querying stress type: INVALIDATE_INTERNAL_CACHE_CQ -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_QP -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_2 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_3 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_4 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_5 -NOT SUPPORTED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX1 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX2 -ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX3 -ENABLED
ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Hang Types
The following are the hang types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:
|
Category
|
Stress Type
|
Description
|
Notes
|
Hang FW/HW
|
STOP_RX_PER_PRIO1
|
This type requires the following extra flags:
mlxfwstress -d mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra %STOP_RX_PER_PRIO[
0x00100FF]
Random seed: [
1588056318]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO -PASSED
To turn this Hang Type, the command must be executed in the following format:
Example:
mlxfwstress -d mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra % STOP_RX_PER_PRIO [
0x000100FF]
output:
Random seed: [
1573642282]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO-PASSED
Turning On Hang Types
To turn on a specific hang type:
mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED
To turn on a set of hang types:
mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED
Turning ON hang type: Rx1 -PASSED
Turning Off Hang Types
To turn off a specific hang type:
mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
To turn off a set of hang types:
mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Querying the Hang Types
To query the state of all hang types:
mlxfwstress -d mt4103_pciconf0 -o query --hang-type ALL
-------------------------------------------------
Operation: [QUERY]
-------------------------------------------------
Querying hang type: Sx1 -ENABLED
Querying hang type: Rx1 -ENABLED
Querying hang type: Tx -ENABLED
Querying hang type: Rx -ENABLED
To clear all stress/hang types:
mlxfwstress - d mt4103_pciconf0 -o clear_all
-------------------------------------------------
Operation: [CLEAR_ALL]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Turning OFF hang type: Tx -PASSED
Turning OFF hang type: Rx -PASSED
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port1 -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port2 -PASSED
Turning OFF stress type: stop_edbh -PASSED
Turning OFF stress type: stop_idbh -PASSED
Turning OFF stress type: stop_qpc_miss_machine_0 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_1 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_2 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_cegw -PASSED
Turning OFF stress type: lock_obgw_tpt -PASSED
Turning OFF stress type: lock_obgw_tcu -PASSED
Turning OFF stress type: lock_obgw_sxd -PASSED
Turning OFF stress type: lock_qpcgw_rx -PASSED
Turning OFF stress type: lock_semaphore_ipc_sx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx0 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_ldb -PASSED
Turning OFF stress type: invalidate_caches -PASSED
To clear the semaphore:
mlxfwstress -d mt4103_pciconf0 -o clear_semaphore
-------------------------------------------------
Operation: [CLEAR_SEMAPHORE]
-------------------------------------------------
Semaphore was cleared successfully
There are two random modes you can choose from:
Single - gives a set of stress types, in each iteration one stress type is chosen an toggled ON/OFF according to his current state
Wild - gives a set of stress types, in each iteration a random subset of stress types is chosen and toggled ON/OFF according to their current state
Setting the Random Mode for the Stress Types
To set the Single Mode:
mlxfwstress -d mt4103_pciconf0 -o random --rand-mode single -t STOP_CE_INSTAGE_EQE --stress-delay
0.2 --iterations
10
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay:
0.2 [sec]
Iterations number:
10
Max on time:
1 [sec]
#############################################
RANDOM ITERATION: [
1]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
0 [ms]
RANDOM ITERATION: [
2]
[stop_ce_instage_eqe]: [OFF], duration since last operation:
200 [ms]
RANDOM ITERATION: [
3]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
201 [ms]
RANDOM ITERATION: [
4]
[stop_ce_instage_eqe]: [OFF], duration since last operation:
200 [ms]
RANDOM ITERATION: [
5]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
200 [ms]
RANDOM ITERATION: [
6]
[stop_ce_instage_eqe]: [OFF], duration since last operation:
201 [ms]
RANDOM ITERATION: [
7]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
200 [ms]
RANDOM ITERATION: [
8]
[stop_ce_instage_eqe]: [OFF], duration since last operation:
201 [ms]
RANDOM ITERATION: [
9]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
200 [ms]
Turning OFF stress type: stop_ce_instage_eqe
RANDOM ITERATION: [
10]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
200 [ms]
=======================================================
Turning off all stress types after random:
Turning OFF stress type: stop_ce_instage_eqe
As seen in the example above, after the specified number of iterations, the tool turns off all the stress types.
The default value for stress-delay is 1 second.
If no number of iterations was supplied then the user is expected to stop the tool with ctrl+c. The tool turns off all the stress types.
To set the Wild Mode:
mlxfwstress -d mt4103_pciconf0 -o random --rand-mode wild -t ALL --stress-delay
0.2 --max-rand-on
1 --iterations
5
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay:
0.2 [sec]
Iterations number:
5
Max on time:
1 [sec]
#############################################
RANDOM ITERATION: [
1]
[stop_ce_instage_eqe]: [ON] , duration since last operation:
0 [ms]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation:
0 [ms]
[stop_edbh]: [ON] , duration since last operation:
0 [ms]
[stop_idbh]: [ON] , duration since last operation:
0 [ms]
[stop_qpc_miss_machine_0]: [ON] , duration since last operation:
0 [ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation:
0 [ms]
[lock_cegw]: [ON] , duration since last operation:
0 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation:
0 [ms]
[lock_qpcgw_rx]: [ON] , duration since last operation:
0 [ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation:
0 [ms]
RANDOM ITERATION: [
2]
[stop_sxp_vl_arb_port1]: [ON] , duration since last operation:
0 [ms]
[stop_edbh]: [OFF], duration since last operation:
203 [ms]
[stop_idbh]: [OFF], duration since last operation:
203 [ms]
[stop_qpc_miss_machine_3]: [OFF], duration since last operation:
202 [ms]
[lock_cegw]: [OFF], duration since last operation:
202 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation:
0 [ms]
[lock_obgw_tcu]: [OFF], duration since last operation:
203 [ms]
[lock_semaphore_ipc_rx0]: [ON] , duration since last operation:
0 [ms]
[lock_semaphore_ipc_rx1]: [ON] , duration since last operation:
0 [ms]
[lock_semaphore_ipc_ldb]: [ON] , duration since last operation:
0 [ms]
RANDOM ITERATION: [
3]
[stop_ce_instage_eqe]: [OFF], duration since last operation:
406 [ms]
[stop_sxp_vl_arb_port2]: [OFF], duration since last operation:
406 [ms]
[stop_edbh]: [ON] , duration since last operation:
203 [ms]
[stop_idbh]: [ON] , duration since last operation:
203 [ms]
[stop_qpc_miss_machine_0]: [OFF], duration since last operation:
406 [ms]
[stop_qpc_miss_machine_2]: [ON] , duration since last operation:
0 [ms]
[lock_obgw_tpt]: [OFF], duration since last operation:
203 [ms]
[lock_obgw_sxd]: [ON] , duration since last operation:
0 [ms]
[lock_semaphore_ipc_sx1]: [OFF], duration since last operation:
405 [ms]
[lock_semaphore_ipc_ldb]: [OFF], duration since last operation:
203 [ms]
RANDOM ITERATION: [
4]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation:
203 [ms]
[stop_edbh]: [OFF], duration since last operation:
202 [ms]
[stop_idbh]: [OFF], duration since last operation:
202 [ms]
[stop_qpc_miss_machine_1]: [ON] , duration since last operation:
0 [ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation:
406 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation:
202 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation:
406 [ms]
[lock_obgw_sxd]: [OFF], duration since last operation:
203 [ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation:
203 [ms]
[lock_semaphore_ipc_rx1]: [OFF], duration since last operation:
406 [ms]
[invalidate_caches]: [ON] , duration since last operation:
0 [ms]
Turning OFF stress type: stop_sxp_vl_arb_port1
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_qpc_miss_machine_1
Turning OFF stress type: stop_qpc_miss_machine_2
Turning OFF stress type: stop_qpc_miss_machine_3
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: lock_semaphore_ipc_sx1
Turning OFF stress type: lock_semaphore_ipc_rx0
Turning OFF stress type: invalidate_caches
RANDOM ITERATION: [
5]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation:
202 [ms]
[stop_idbh]: [ON] , duration since last operation:
322 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation:
202 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation:
202 [ms]
[lock_qpcgw_rx]: [ON] , duration since last operation:
202 [ms]
[invalidate_caches]: [ON] , duration since last operation:
202 [ms]
=======================================================
Turning off all stress types after random:
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_idbh
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: invalidate_caches
ConnectX-3/ConnectX-3 Pro Adapter Cards Hang Types
The following are the hang types available for ConnectX-3/ConnectX-3 Pro adapter cards:
|
Category
|
Stress Type
|
Description
|
Notes
|
Hang FW/HW
|
HANG_SX1
|
HANG_RX1
|
HANG_TX
|
HANG_RX
|
ALL
|
Hang types that require extra flags are not supported when running with the 'ALL' option.