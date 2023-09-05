Congestion Control Added support for configuring CC buffer thresholds according to the switch capabilities.

Performance Improved performance of routing calculation for routers.

Unicast Route Rebalancing Added the option to avoid unicast route rebalancing when HBF is enabled on all SLs.

Port GUID Added the option to write the destination port GUID when logging direct routed SMPs.

Systemd Added support for Systemd.

v5.15.0

Programmable Congestion Control Extended IBCC to support programmable congestion control. Feature control parameters: ppcc_algo_dir - Path to directory with PPCC algorithm profile files

Fast Recovery Added Fast Recovery support to configure a policy on switches for reporting ports as unhealthy, and to support isolating the unhealthy ports.

Feature control parameters: fast_recovery_enabled - Enable Fast Recovery feature

0 - SM will not send configuration, and will not isolate ports reported by switch (default)

1 - Disable the feature

2 - Enable the feature

fast_recovery_conf_file - Path to fast recovery policy file

General Added support for running SM with a topology specification file Feature control parameters: topo_config_enabled - Enable/disable the feature

topo_config_file - Path to topology speciation file

General Added support for generating SM performance report. the report file is created in the SM logs directory with the name "opensm-perflog.json" . Feature control parameters: enable_performance_logging - Enable/disable the feature (enabled by default) osm_perflog_dump_limit - Limit performance log file (20MB by default)

General Added port label for supporting NDR switches and CAs.

General Added support for additional predefined port groups: ALL_ANS - All aggregation nodes

ALL_SWITCH_TO_SWITCH - All switch ports connected to other switches

ALL_SWITCH_TO_CA - All switch ports connected to CAs

General Limited the number of simultaneous SL2VL and VLArb MADs sent per device.

General Enabled dumping FLID ranges to the opensm-router.dump file

Routing Added support for running an SM root detection algorithm when root GUIDs file is invalid.

Routing Improved FLID routing calculation time.

v5.14.0

General Updated the subnet configuration flow as follow: Switch-to-switch links are updated prior to updating routing tables

ARLFT tables are set prior to setting the AR Group tables

Routing tables of switches with topology changes are updated prior to tables of switches without topology changes

General Added support for multithreaded SL2VL calculation.

Routers Added support for AR over routers (FLIDs).

DFP Routing Engine Added support for persistent AR groups for DF+ 1 (dfp) routing engine.

UPDN and AR UPDN Routing Engine Improved UPDN minhop tables calculation times.

v5.13.0

SL-to-VL Mapping Table Added support for port masked optimized SLtoVLMappingTable programming.

DFP2 Routing Engine Added support for PFRN on DFP2 routing engine.

PKey Validation Traps Added support for suppressing multicast PKey validation traps.

General Updated the default value of drop_event_subscriptions to TRUE.

General Updated the default value of drop_subscr_on_report_fail to TRUE.

General Enabled sending LFT/ARLFT entry for SMLID last.

MTU/Rate Enabled MTU and Rate calculation for router PathRecords according to route.

v5.12.0

Self Healing Network with PFRN Self healing network with PFRN is now at GA level for AR_UPDN routing engine.

Hash Based Forwarding (HBF) Hash Based Forwarding (HBF) is now at GA level

Adaptive Routing Engine Improved routing calculation time for AR_UPDN routing engine.

Adaptive Routing A dedicated AR group ID per leaf is now assigned also when SHIELD is disabled.

General Added the option to avoid initializing links marked for port resets.

General Updated root_guid_file parameter description.

General Removed from guid2lid vPorts with LID required set to 0.

DragonFly+ Topologies Improved root detection algorithm for DragonFly+ topologies to support leaf switches without hosts.

v5.11.0

Self healing network with PFRN [Beta] PFRN is used for fast link fault recovery. If a link fails or disconnects, switches send messages to the peer switches to update the routing tables.

This feature is supported only on ar_updn and ar_ftree routing engine and if all fabric switches support PFRN (NVIDIA Quantum and NVIDIA Quantum-2 switch systems only).

Feature control parameters: pfrn_sl - SL for PFRN messages (default SL 0).

pfrn_mask_clear_timeout - Time-out since the last PFRN message received by the switch for an AR group, after which unused port masks will be cleared. The value is in multipliers of 30 seconds. (default 180)

pfrn_mask_force_clear_timeout - Time-out since last mask clear operation, after which unused port masks are cleared by the switch. The value is in multipliers of 240 seconds. (default 720) In order to disable PFRN, set shield_mode value to 2.

Multiport high availability Allows SM to failover to another port in the case of SM link failure. It requires configuring more than one port GUID in the GUID parameter. Feature control parameter: enable_sm_port_failover - Enable or disable failover (default FALSE).

Hash Based Forwarding (HBF) Allows selection of the switch outgoing port for statically routed packets based on the packet's parameters (ECMP like). With dfp2 routing engine, non-minhop routes will be used for static routing as well as for Adaptive Routing. Feature control parameters: hbf_sl_mask - SLs supporting HBF (default 0).

hbf_hash_type - Hash function for HBF 0 - CRC (default). 1 - XOR.

hbf_seed_type - Hash seed type: 0 - Seed (default). 1 - Random.

hbf_seed - Hash seed, 32 bit number: 0xffffffff - Use switch GUID for seed (default). 0x0-0xfffffffe - Specific seed value.

hbf_hash_fields - Fields of packet for hash calculation (default 0x40F00C0F).

hbf_weights - Weights ratio between ports of different AR sub-groups: auto - SM/routing engine decision (default). <sg0>,<sg1>,<sg2> - User defined weights for subgroup 0 to subgroup 2.



SA response time Improved SA response time for multicast join requests during routing calculation.

Persistent mapping Added support for persistent mapping between AR group ID and the destination switch GUID.

Switch SMA response MADs Switch SMA response MADs are now routed using PLFT0 to overcome a firmware limitation in dfp2.

SM ports table Added SM ports table to SMDB.

Log message verbosity Changed verbosity of log message when toggling ports to INFO.

Port state report Added the option to report to the log when failing to update port state from ARM to ACTIVE.

Virtualization traps Added details for virtualization traps to the log file.

Statistics dump file per SM Enabled statistics dump file per SM port by default.

Asymmetric flow algorithm for trees Enabled asymmetric flow algorithm for trees (ar_updn and ar_ftree) by default.

v5.10.0

Adaptive Timeout SL Mask Added support for Adaptive Timeout SL mask. Feature control parameter : adaptive_timeout_sl_mask - an adaptive timeout enabled SLs mask. (Default 0xFFFF)

IB Router QoS Extended the QoS policy file to support subnet prefixes and port GIDs for inter subnet QoS.

This improvement enables the definition of SL/rate/MTU/packet-life for cross-subnet paths.

For further information, refer to the doc/QoS_management_in_OpenSM.txt document.

Routing Engine Added a new root detection algorithm in UPDN and ar_updn routing engines. Feature control parameters : find_roots_color_algorithm - enables/disables the feature. (Default is TRUE)

max_cas_on_spine - sets the maximum number of CAs on a switch to allow considering it

as a spine instead of a leaf by the routing algorithm.

Routing Engine Changed the default routing engine to be ar_updn instead of minhop .

Vendor Specific (VS) Key Added support for Vendor Specific (VS) key. The following are the parameters related to the feature: vs_key_enable - enables VS key configuration: 0 - ignore 1 - disable 2 - enable

vs_key_lease_period - the lease period used for VS keys in [sec].

vs_key_ci_protect_bits - the protection level for the key: 1 - protected 0 - unprotected (The response Key Info exposes the key).

vs_max_outstanding_mads - the maximum number of outstanding VS MADs in the network at once.

key_mgr_seed - used by the key manager for VS key configuration.

SA Response Time Improved SA response time during routing calculation. Feature control parameter: enable_queries_during_routing - enables SA queries during routing calculation. (Default is TRUE)

Report Duplicated GUIDs Added support for reporting duplicated GUIDs to UFM.

Switch Reboot Added the option to report switch reboots to UFM.

Long Transaction Timeout Enabled the option to use long transaction timeout for PI for port 0 MADs.

SMDB Dump File Added subnet prefix to SMDB dump file.

SM Binding Port Information Added SM binding port information to the MAD details in the timeout message, dumped to the SM log file.

OpenSM Start Time Added SM start time to the SMDB dump file.

Dump MAD Statistics per SM Port Added the option to enable dump MAD statistics per SM port. Feature control parameter: osm_stats_dump_per_sm_port - enables/disables the feature. (Default is FALSE)

Adaptive Routing (AR) Group IDs Made the process of selecting AR (Adaptive Routing) group IDs deterministic in each run of the SM on the same fabric.

v5.9.1

Link Speed Added support for NDR InfiniBand link speed in SM,

Configuration File Validation Added a new command line option "--validate_conf_files" to enable SM to only validate configuration files and exit afterwards.

Note: This version of the tool supports only the validation partition file part.

Persistent Multicast (MC) Trees This capability enables reading MulticastForwardingTables tables upon SM startup/fail-over to ensure the new MASTER SM does not break multicast routing. To enable/disable it use the "get_mft_tables" parameter (default TRUE).

DragonFly+ Topologies Added SHIELD support for dfp2 routing engine for DragonFly+ topologies.

SM Allowed GUIDs List This new capability enables the user to specify the list of GUIDs allowed to run SM in the fabric. When the list is provided, the master SM will avoid handover to ports that are not specified in the list. To enable this feature use the "allowed_sm_guids" parameter. When set to "(null)", the feature disabled.

Limiting the Number of VLs for Long Distance Links This new capability enables the user to set the maximum operational VL per port by a new file specified by the "device_configutarion_file" parameter in the OpenSM configuration file. To provide per port configuration use the "device_configutarion_file". For more information, see doc/device_configuration.md.

Send ClientReregister after Subnet Configuration This new capability enables the user to send ClientReregister after subnet configuration to prevent the hosts from sending SA requests to the SM before the SM is ready to respond to them.

This feature can be controlled using the following parameters:

client_rereg_mode - Control modes of sending ClientRergister.

Supported values: 0 - Do not send client re-registration.

1 - Send client re-registration during LID assignment (previous default behavior).

2 - [Default] Send client re-registration after routing and QoS configuration from link manager. The new parameter replaces the depracated "no_clients_rereg" parameter.

kDOR Generalized Hypercube Engine Added kDOR Generalized Hypercube engine.

General Added the option to print a summary of AR and DragonFly+ supported switches to the log

Improved performance of NR lookup by LID

Changed the verbosity of port group creation messages to be in INFO level

Added new statistics counters to opensm-statistics.dump

Added the option to consider affinity when calculating number of cores

v5.8.1

Asymmetric trees The feature is applicable to ar_updn and ar_ftree routing engines. It reduces congestion in asymmetric tree topologies with missing uplinks on leaf switches.

To enable/disable the feature, use the ar_tree_asymmetric_flow parameter. The supported values are: 0 - Disable the feature (default).

1 - Enable the feature using single AR subgroup.

Note: Recommended for asymmetric tree topologies with 1000-2000 leaf switches.

2 - Enable the feature using two AR subgroups.

Note: Recommended for asymmetric tree topologies with less than 1000 leaf switches.

Selecting LID for Master SM This feature prevents SM LID changes upon fail-over.

To set the LID for master SM, use the master_sm_lid parameter. The supported values are: 0 - Disable the feature (default).

1-0xBFFF - LID to set to SM port when in MASTER state.

Root GUIDs file for Dragonfly+ Routing Engines This feature enables root GUIDs file for Dragonfly+ topology Routing Engines ( dfp and dfp2 ).

To set the file with GUIDs of root switches of the topology use the root_guid_file parameter.

Dragonfly+ Routing Engine Added new routing engine ( dfp2 ) for Dragonfly+ topologies. This engine supports Dragonfly+ topologies with any kind of tree topology islands. If the topology contains an island with more than 2 tree levels, the root GUIDs file, including the root switches of all Dragonfly+ islands should be provided.

To add the dfp2 new Routing Engine, use the routing_engine parameter.

Maximum Operational VLs for Ca, Routers and Switches This feature enables the user to configure different max_op_vls for CAs, Routers and Switches.

To set the maximum operational VLs per device type, use the following parameters: max_op_vls_ca - Maximum operational VLs for CAs. When 0, use value max_op_vls. (default 0)

max_op_vls_rtr - Maximum operational VLs for routers. When 0, use value max_op_vls. (default 0)

max_op_vls_sw - Maximum operational VLs for switches. When 0, use value max_op_vls. (default 0)

“VL packing” for Dragonfly+ and KDOR Routing Engines. Added support for “VL packing” for Dragonfly+ and KDOR routing engines. This feature reduces the maximum operational VLs for CAs to half of subnet max_op_vls when using dfp/dfp2/kdor_hc routing engines.

To enable/disable the feature, use the enable_vl_packing parameter. The following is an example of “VL packing”: enable_vl_packing set to TRUE

max_op_vls set to 3 (Enable 4 VLs)

max_op_vls_ca set to 2 (Use 2 VLs for CAs)

Support SRP target on HCAs with Socket-Direct architecture/Virtual Machines. Enabled returning PortInfoRecord and NodeRecord for virtual ports and reporting virtual port capability changes To enable/disable the feature (default TRUE), use the enable_virt_rec_ext parameter.

General Improved balancing of direct routes calculated for multi-port

Set HCA-grp in port groups parser to be an optional parameter

Added support for router alias GUIDs configuration for virtual ports

Added NDR speed port info capability bit to ib_types.h

Avoided sending client reregister to vport index 0

Extended rpg_byte_reset to 19 bits in Congestion Control

Enabled crashd by default

Added report auxiliary port state changes

OpenSM now overrides the attributes of IPoIB multicast groups loaded from the SADB with broadcast group

Limited the number of PortInfo and MEPI MADs sent per device simultaneously

Configured switch AR SL mask according to the ar_sl_mask configuration parameter

Updated GeneralInfo device IDs list to include NVIDIA Quantum-2 and future ConnectX family devices

Updated the man page with AR routing engines

Avoided exiting OpenSM when failing to bind to auxiliary port

v5.7.2

General Added support for MCMR join/leave requests with default subnet prefix

Set Enabled AR SL mask on switches according to ar_sl_mask.

v5.7.1

Multiple ports Allows MLNXSM to use multiple ports for sending Subnet configuration MADs.

Feature control parameters: guid - Comma separated list of MLNXSM port GUIDs.

First port GUID specifies primary port which used for Subnet Management (discovery, traps) and Subnet Administration.

Additional port GUIDs are used for sending subnet configuration (SMP Set MADs).

Configuration file example: guid 0x10001,0x10002

Extend router selection algorithm Supports specifying hash function, seed and additional hash function arguments for router selection during path records calculation.

Feature control parameters: rtr_selection_function - Hash function to be used by router selection algorithm.

Supported values - crc32 (default).

rtr_selection_seed - Seed for router selection algorithm. (default 0)

rtr_selection_algo_parameters - Comma separated list of parameters for router selection algorithm.

Supported values: sgid, dgid. (default sgid, dgid)

LMC for routers and number of LIDs allowed per router for inter-subnet path records Feature control parameters: lids_per_rtr - Defines number of Router LIDs to be used for inter-subnet path records.

When set to 0, MLNXSM will use number of LIDs per router according to global LMC. (default 0)

When set to non-zero, MLNXSM will set LMC for router ports according to the value of this parameter (minimal N such that 2^N >= lids_per_rtr).

If global LMC is not zero, lids_per_rtr is ignored.

When lids_per_rtr is set to non-zero value, updn/ar_updn/chain with updn routing engines should be used.

Congestion Control Feature control parameters: mlnx_congestion_control - Enabled/Disable Mellanox Congestion Control.

Supported values:

0 - Do not configure congestion control (default).

1 - Disable congestion control on the subnet.

2 - Configure congestion control according to policy file.

congestion_control_policy_file - Path to congestion control policy file. For additional information, please review congestion_control.md file provided with MLNXSM.

LIDs range in Routing Chains Replaces path-bit qualifier in routing chain configuration by min-path-bit and max-path-bit qualifiers. (path-bit is still supported for backward compatibility). Example of usage: min-path-bit: 1

max-path-bit: 3

Controlling maximum number of MADs on wire per destination Feature control parameters: max_wire_smps_per_device - Number of MADs on the wire per device. (default 2)

Configuring service keys to service name Service keys' configuring service names.

Feature control parameters:

service_name2key_map_file - Path to service name to service key map file.

File contains mapping from service name to service key which is specified in IPv6 format.

For example, map service name <SERVICE NAME> service key 0::1 by adding the following line to the file:

<SERVICE NAME> 0::1

General Disabled the option to send PortInfo MADs to switch ports that did not change their state from the previous sweep.

Enabled Adaptive Routing for all SLs on switches.

Set limit to SMInfo dispatcher queue.

Improved performance of missing routes calculation for trees.

Improved performance of ar_updn and ar_ftree routing engines.

Improved performance of inter-subnet path record calculation.

Added log number of link resets by MLNXSM at the end of heavy sweep.

Disabled creating subnet LST file as default.

Feature control parameters: enable_lst_file - Controls dumping subnet LST file of the topology

If set to TRUE , LST file is created after heavy sweep. (default FALSE )

Removed LMC support from verbosity bypass.

Enabled empty port groups file in routing chains.

Aligned index table columns in SMDB file.

UPDN LID tracking - Added the option to give precedence to exit ports leading to switch with lower total number of routes over exit ports leading to switch with less routes to the switch of the destination LID.

Feature control parameters: updn_lid_tracking_prefer_total_routes

If set to TRUE , enable the feature. (default FALSE )

UPDN LID tracking - Improved routing algorithm to improve routing utilization and routes balancing.

UPDN LID tracking - Updated routing engine to support LMC.