image image image image image

On This Page

The tool is currently supported on Windows platforms only.

The tool can support new devices only once the tool is upgraded to its latest version.


mlxfwstress enables/disables various firmware stress flows. It can work in multiple modes:

  • Enable/disable a specific set of stress types
  • Clear all stress types
  • Random mode:
    • Single mode - choose one stress type in each iteration and enable/disable it
    • Wild-mode- choose multiple stress types in each iteration and enable/disable them

Each time a stress type is chosen in a random iteration, the opposite operation is done on it (e.g., if a stress type is turned on, in the next iteration it will be turned off and vice versa).

  • Toggle mode:
    • Turns on and off the list of stress types alternating. Can be used with iterations.

      To disable a stressor while in toggling mode, first you must disable the mlxfwstress tool, and only after that disable the stressor.

  • Clear semaphore:
    Note: This functionality is supported in ConnectX-3 Pro adapter cards only. 

mlxfwstress Synopsis

# mlxfwstress [-d|--dev <DeviceName>] [-h|--help] [-v|--version] [-o|--operation <Operation>] [--rand-mode <Random mode>] [-t|--stress-type <Stress type>] [--iterations <Iterations>] [--stress-delay <Stress delay>] [--max-rand-on <Max rand on>] [--hang-type <Hang type>] [--seed <seed>] [--toggle-time <x,y>]

where:

-d|--dev <DeviceName>

Perform operation for a specified device

-h|--help

Show this message and exit

-v|--version

Show the executable version and exit

-o|--operation <Operation>

Choose operation: on, off, clear_all, random query, clear_semaphore

--rand-mode <Random mode>

Choose a random mode: single, wild

-t|--stress-type <Stress type>

Specify a list of stress types separated by comma. (See Stress Types.)

--iterations <Iterations>

Specify the number of iterations.

--stress-delay <Stress delay>

Specify the stress delay in seconds (can be float). 
Note: Some stress flows may take more time.

Recommended values: 0-1

--max-rand-on <Max rand on>

Specify the maximal time a stress is allowed to be on in random mode in seconds.

Recommended values (0,1]

Default is 1

--hang-type <Hang type>

Specify a list of hang types separated by comma. (See Hang Types.)

--seed <seed>

Specify the seed for the random.

--toggle-time <x,y>

Toggle time after off, both in seconds (can be float). If y is not supplied the tool will use equal values for x and y

Stress Types

ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Stress Types

The following are the stress types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:

CategoryStress TypeDescriptionNotes
TransparentPAUSE_STORM_GENERATION       Generates pause frames from the device toward the network
INVALIDATE_INTERNAL_CACHE_RX_1    Invalidates STE cache 
INVALIDATE_INTERNAL_CACHE_RX_2Invalidates qp L0 cache (RX)
INVALIDATE_INTERNAL_CACHE_RX_3Invalidates dct L0 cache (RX)
INVALIDATE_INTERNAL_CACHE_RX_4Invalidates scatter list cache in RX
INVALIDATE_INTERNAL_CACHE_CQInvalidates CQC cache
INVALIDATE_INTERNAL_CACHE_SX1Invalidates SXDC cache
INVALIDATE_INTERNAL_CACHE_RX_5Invalidates LDB cache
INVALIDATE_INTERNAL_CACHE_GENERAL_1Invalidates RO caches
INVALIDATE_INTERNAL_CACHE_SX2Invalidates pkey cache (SX)
INVALIDATE_INTERNAL_CACHE_SX3Invalidates guid cache (SX)
INVALIDATE_INTERNAL_CACHE_QPInvalidates QPC (main QP cache unit)
Hang FW/HWPACKET_DROPDrops N packets on portx

This type requires the following extra flags:

  • num_of_packets - 8 bit (max 15)
  • port_num - 8 bit (should be 1 or 2)
ConnectX-3 Pro Adapter Cards Stress Types

The following are the stress types available for ConnectX-3 Pro adapter cards:

Stressors in "Transparent" category that are active for more than 100 msec, may cause resiliency.

CategoryStress TypeDescription
Transparent

STOP_CE_INSTAGE_EQEStops sending EQEs created by the hardware (not the ones created by the firmware).
STOP_EDBHStops the handling of external doorbells.
STOP_IDBHStops the handling of internal doorbells.

STOP_QPC_MISS_MACHINE_0

STOP_QPC_MISS_MACHINE_1

STOP_QPC_MISS_MACHINE_2

STOP_QPC_MISS_MACHINE_3

Spots reading a QPC from the ICM on a miss-blocking hardware/firmware that accesses the QPC
LOCK_CEGWLocks the CQE gateway.

LOCK_OBGW_TPT

LOCK_OBGW_TCU

LOCK_OBGW_SXD

Locks the OBGW (access to the host memory gateway).
LOCK_QPCGW_RXLocks QPCGW.

LOCK_SEMAPHORE_IPC_RX0

LOCK_SEMAPHORE_IPC_RX1

LOCK_SEMAPHORE_IPC_LDB

LOCK_SEMAPHORE_IPC_SX1

Locks the IPC semaphore.
INVALIDATE_CACHESInvalidates caches.
Performance

STOP_SXP_VL_ARB_PORT1

STOP_SXP_VL_ARB_PORT2

Stops transmission of packets to the wire. Causes head-of-line packet drop (HLL) if enabled.

RX_BACKPRESSURE

Stops the RX pipe - back-pressure to wire- sending tx pauses.
DROP_PACKETS_TXDrops packets on the TX side.
Turning On Stress Types

To turn on a specific stress type:

mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED

To turn on a set of stress types:

mlxfwstress -d mt4103_pciconf0 -o on -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON stress type: stop_ce_instage_eqe -PASSED
Turning ON stress type: stop_qpc_miss_machine_3 -PASSED
Turning ON stress type: lock_semaphore_ipc_rx1 -PASSED

To turn on all the available stress types:

mlxfwstress -d mt4119_pciconf0 -t ALL -o on
Random seed: [1587969653]
-------------------------------------------------
Operation:         [ON]
-------------------------------------------------
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_CQ                         -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1                  -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_QP                         -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_1                       -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_2                       -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_3                       -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_4                       -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_RX_5                       -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX1                        -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX2                        -PASSED
Turning ON stress type: INVALIDATE_INTERNAL_CACHE_SX3                        -PASSED

Turning Off Stress Types

To turn off a specific stress type:

mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED

To turn off a set of stress types:

mlxfwstress -d mt4103_pciconf0 -o off -t STOP_CE_INSTAGE_EQE,STOP_QPC_MISS_MACHINE_3,LOCK_SEMAPHORE_IPC_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Querying the Stress Types

To query the state of all stress types:

mlxfwstress -d mt4117_pciconf0 -o query -t ALL
-------------------------------------------------
Operation:         [QUERY]
-------------------------------------------------
Querying stress type: INVALIDATE_INTERNAL_CACHE_CQ  		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_GENERAL_1	-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_QP  		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_1		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_2		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_3		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_4		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_RX_5		-NOT SUPPORTED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX1 		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX2 		-ENABLED
Querying stress type: INVALIDATE_INTERNAL_CACHE_SX3 		-ENABLED

Hang Types

ConnectX-4/ConnectX-4 Lx/ConnectX-5 Adapter Cards Hang Types

The following are the hang types available for ConnectX-4/ConnectX-4 Lx/ConnectX-5 adapter cards:

CategoryStress TypeDescriptionNotes
Hang FW/HWSTOP_RX_PER_PRIO1

This type requires the following extra flags:

  • vl_mask - 16 bit
  • port_num - 8 bit
mlxfwstress -d  mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra %STOP_RX_PER_PRIO[0x00100FF]
Random seed: [1588056318]
-------------------------------------------------
Operation:         [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO              		-PASSED

To turn this Hang Type, the command must be executed in the following format: 

Example: 

mlxfwstress -d mt4115_pciconf0 -o on --hang-type STOP_RX_PER_PRIO --extra % STOP_RX_PER_PRIO [0x000100FF]
output:
Random seed: [1573642282]
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: STOP_RX_PER_PRIO-PASSED
Turning On Hang Types

To turn on a specific hang type:

mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED

To turn on a set of hang types:

mlxfwstress -d mt4103_pciconf0 -o on --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [ON]
-------------------------------------------------
Turning ON hang type: Sx1 -PASSED
Turning ON hang type: Rx1 -PASSED
Turning Off Hang Types

To turn off a specific hang type:

mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED

To turn off a set of hang types:

mlxfwstress -d mt4103_pciconf0 -o off --hang-type HANG_SX1,HANG_RX1
-------------------------------------------------
Operation: [OFF]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Querying the Hang Types

To query the state of all hang types:

mlxfwstress -d mt4103_pciconf0 -o query --hang-type ALL
-------------------------------------------------
Operation: [QUERY]
-------------------------------------------------
Querying hang type: Sx1 -ENABLED
Querying hang type: Rx1 -ENABLED
Querying hang type: Tx -ENABLED
Querying hang type: Rx -ENABLED

Clearing all Stress/Hang Types

To clear all stress/hang types:

mlxfwstress - d mt4103_pciconf0 -o clear_all
-------------------------------------------------
Operation: [CLEAR_ALL]
-------------------------------------------------
Turning OFF hang type: Sx1 -PASSED
Turning OFF hang type: Rx1 -PASSED
Turning OFF hang type: Tx -PASSED
Turning OFF hang type: Rx -PASSED
Turning OFF stress type: stop_ce_instage_eqe -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port1 -PASSED
Turning OFF stress type: stop_sxp_vl_arb_port2 -PASSED
Turning OFF stress type: stop_edbh -PASSED
Turning OFF stress type: stop_idbh -PASSED
Turning OFF stress type: stop_qpc_miss_machine_0 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_1 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_2 -PASSED
Turning OFF stress type: stop_qpc_miss_machine_3 -PASSED
Turning OFF stress type: lock_cegw -PASSED
Turning OFF stress type: lock_obgw_tpt -PASSED
Turning OFF stress type: lock_obgw_tcu -PASSED
Turning OFF stress type: lock_obgw_sxd -PASSED
Turning OFF stress type: lock_qpcgw_rx -PASSED
Turning OFF stress type: lock_semaphore_ipc_sx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx0 -PASSED
Turning OFF stress type: lock_semaphore_ipc_rx1 -PASSED
Turning OFF stress type: lock_semaphore_ipc_ldb -PASSED
Turning OFF stress type: invalidate_caches -PASSED

Clearing the Semaphore

To clear the semaphore:

mlxfwstress -d mt4103_pciconf0 -o clear_semaphore
-------------------------------------------------
Operation: [CLEAR_SEMAPHORE]
-------------------------------------------------
Semaphore was cleared successfully

Random Operation

There are two random modes you can choose from:

  • Single - gives a set of stress types, in each iteration one stress type is chosen an toggled ON/OFF according to his current state
  • Wild - gives a set of stress types, in each iteration a random subset of stress types is chosen and toggled ON/OFF according to their current state
Setting the Random Mode for the Stress Types

To set the Single Mode:

mlxfwstress -d mt4103_pciconf0 -o random --rand-mode single -t STOP_CE_INSTAGE_EQE --stress-delay 0.2 --iterations 10
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay: 0.2 [sec]
Iterations number: 10
Max on time: 1 [sec]
#############################################
RANDOM ITERATION: [1]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 0 [ms]
RANDOM ITERATION: [2]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 200 [ms]
RANDOM ITERATION: [3]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 201 [ms]
RANDOM ITERATION: [4]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 200 [ms]
RANDOM ITERATION: [5]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200 [ms]
RANDOM ITERATION: [6]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 201 [ms]
RANDOM ITERATION: [7]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200 [ms]
RANDOM ITERATION: [8]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 201 [ms]
RANDOM ITERATION: [9]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200 [ms]
Turning OFF stress type: stop_ce_instage_eqe
RANDOM ITERATION: [10]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 200 [ms]
=======================================================
Turning off all stress types after random:
Turning OFF stress type: stop_ce_instage_eqe
  • As seen in the example above, after the specified number of iterations, the tool turns off all the stress types.
  • The default value for stress-delay is 1 second.
  • If no number of iterations was supplied then the user is expected to stop the tool with ctrl+c. The tool turns off all the stress types.

To set the Wild Mode:

mlxfwstress -d mt4103_pciconf0 -o random --rand-mode wild -t ALL --stress-delay 0.2 --max-rand-on 1 --iterations 5
-------------------------------------------------
Operation: [RANDOM]
-------------------------------------------------
#############################################
Random:
Iterations delay: 0.2 [sec]
Iterations number: 5
Max on time: 1 [sec]
#############################################
 
RANDOM ITERATION: [1]
[stop_ce_instage_eqe]: [ON] , duration since last operation: 0 [ms]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 0 [ms]
[stop_edbh]: [ON] , duration since last operation: 0 [ms]
[stop_idbh]: [ON] , duration since last operation: 0 [ms]
[stop_qpc_miss_machine_0]: [ON] , duration since last operation: 0 [ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation: 0 [ms]
[lock_cegw]: [ON] , duration since last operation: 0 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 0 [ms]
[lock_qpcgw_rx]: [ON] , duration since last operation: 0 [ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation: 0 [ms]
 
RANDOM ITERATION: [2]
[stop_sxp_vl_arb_port1]: [ON] , duration since last operation: 0 [ms]
[stop_edbh]: [OFF], duration since last operation: 203 [ms]
[stop_idbh]: [OFF], duration since last operation: 203 [ms]
[stop_qpc_miss_machine_3]: [OFF], duration since last operation: 202 [ms]
[lock_cegw]: [OFF], duration since last operation: 202 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 0 [ms]
[lock_obgw_tcu]: [OFF], duration since last operation: 203 [ms]
[lock_semaphore_ipc_rx0]: [ON] , duration since last operation: 0 [ms]
[lock_semaphore_ipc_rx1]: [ON] , duration since last operation: 0 [ms]
[lock_semaphore_ipc_ldb]: [ON] , duration since last operation: 0 [ms]
 
RANDOM ITERATION: [3]
[stop_ce_instage_eqe]: [OFF], duration since last operation: 406 [ms]
[stop_sxp_vl_arb_port2]: [OFF], duration since last operation: 406 [ms]
[stop_edbh]: [ON] , duration since last operation: 203 [ms]
[stop_idbh]: [ON] , duration since last operation: 203 [ms]
[stop_qpc_miss_machine_0]: [OFF], duration since last operation: 406 [ms]
[stop_qpc_miss_machine_2]: [ON] , duration since last operation: 0 [ms]
[lock_obgw_tpt]: [OFF], duration since last operation: 203 [ms]
[lock_obgw_sxd]: [ON] , duration since last operation: 0 [ms]
[lock_semaphore_ipc_sx1]: [OFF], duration since last operation: 405 [ms]
[lock_semaphore_ipc_ldb]: [OFF], duration since last operation: 203 [ms]
 
RANDOM ITERATION: [4]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 203 [ms]
[stop_edbh]: [OFF], duration since last operation: 202 [ms]
[stop_idbh]: [OFF], duration since last operation: 202 [ms]
[stop_qpc_miss_machine_1]: [ON] , duration since last operation: 0 [ms]
[stop_qpc_miss_machine_3]: [ON] , duration since last operation: 406 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 202 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 406 [ms]
[lock_obgw_sxd]: [OFF], duration since last operation: 203 [ms]
[lock_semaphore_ipc_sx1]: [ON] , duration since last operation: 203 [ms]
[lock_semaphore_ipc_rx1]: [OFF], duration since last operation: 406 [ms]
[invalidate_caches]: [ON] , duration since last operation: 0 [ms]
 
Turning OFF stress type: stop_sxp_vl_arb_port1
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_qpc_miss_machine_1
Turning OFF stress type: stop_qpc_miss_machine_2
Turning OFF stress type: stop_qpc_miss_machine_3
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: lock_semaphore_ipc_sx1
Turning OFF stress type: lock_semaphore_ipc_rx0
Turning OFF stress type: invalidate_caches
 
RANDOM ITERATION: [5]
[stop_sxp_vl_arb_port2]: [ON] , duration since last operation: 202 [ms]
[stop_idbh]: [ON] , duration since last operation: 322 [ms]
[lock_obgw_tpt]: [ON] , duration since last operation: 202 [ms]
[lock_obgw_tcu]: [ON] , duration since last operation: 202 [ms]
[lock_qpcgw_rx]: [ON] , duration since last operation: 202 [ms]
[invalidate_caches]: [ON] , duration since last operation: 202 [ms]
=======================================================
Turning off all stress types after random:
 
Turning OFF stress type: stop_sxp_vl_arb_port2
Turning OFF stress type: stop_idbh
Turning OFF stress type: lock_obgw_tpt
Turning OFF stress type: lock_obgw_tcu
Turning OFF stress type: lock_qpcgw_rx
Turning OFF stress type: invalidate_caches
ConnectX-3/ConnectX-3 Pro Adapter Cards Hang Types

The following are the hang types available for ConnectX-3/ConnectX-3 Pro adapter cards:

CategoryStress TypeDescriptionNotes
Hang FW/HWHANG_SX1

HANG_RX1

HANG_TX

HANG_RX

ALL
Hang types that require extra flags are not supported when running with the 'ALL' option.