MLNXSM (InfiniBand Subnet Manager) Utility Release Notes v5.11.0

Changes and New Features History

v5.10.0

Adaptive Timeout SL Mask

Added support for Adaptive Timeout SL mask.

Feature control parameter:

  • adaptive_timeout_sl_mask - an adaptive timeout enabled SLs mask. (Default 0xFFFF)

IB Router QoS

Extended the QoS policy file to support subnet prefixes and port GIDs for inter subnet QoS.
This improvement enables the definition of SL/rate/MTU/packet-life for cross-subnet paths.
For further information, refer to the doc/QoS_management_in_OpenSM.txt document.

Routing Engine

Added a new root detection algorithm in UPDN and ar_updn routing engines.

Feature control parameters:

  • find_roots_color_algorithm - enables/disables the feature. (Default is TRUE)

  • max_cas_on_spine - sets the maximum number of CAs on a switch to allow considering it
    as a spine instead of a leaf by the routing algorithm.

Routing Engine

Changed the default routing engine to be ar_updn instead of minhop.

Vendor Specific (VS) Key

Added support for Vendor Specific (VS) key.

The following are the parameters related to the feature:

  • vs_key_enable - enables VS key configuration:

    • 0 - ignore

    • 1 - disable

    • 2 - enable

  • vs_key_lease_period - the lease period used for VS keys in [sec].

  • vs_key_ci_protect_bits - the protection level for the key:

    • 1 - protected

    • 0 - unprotected (The response Key Info exposes the key).

  • vs_max_outstanding_mads - the maximum number of outstanding VS MADs in the network at once.

  • key_mgr_seed - used by the key manager for VS key configuration.

SA Response Time

Improved SA response time during routing calculation.

Feature control parameter:

  • enable_queries_during_routing - enables SA queries during routing calculation. (Default is TRUE)

Report Duplicated GUIDs

Added support for reporting duplicated GUIDs to UFM.

Switch Reboot

Added the option to report switch reboots to UFM.

Long Transaction Timeout

Enabled the option to use long transaction timeout for PI for port 0 MADs.

SMDB Dump File

Added subnet prefix to SMDB dump file.

SM Binding Port Information

Added SM binding port information to the MAD details in the timeout message, dumped to the SM log file.

OpenSM Start Time

Added SM start time to the SMDB dump file.

Dump MAD Statistics per SM Port

Added the option to enable dump MAD statistics per SM port.

Feature control parameter:

  • osm_stats_dump_per_sm_port - enables/disables the feature. (Default is FALSE)

Adaptive Routing (AR) Group IDs

Made the process of selecting AR (Adaptive Routing) group IDs deterministic in each run of the SM on the same fabric.

v5.9.1

Link Speed

Added support for NDR InfiniBand link speed in SM,

Configuration File Validation

Added a new command line option "--validate_conf_files" to enable SM to only validate configuration files and exit afterwards.
Note: This version of the tool supports only the validation partition file part.

Persistent Multicast (MC) Trees

This capability enables reading MulticastForwardingTables tables upon SM startup/fail-over to ensure the new MASTER SM does not break multicast routing.

To enable/disable it use the "get_mft_tables" parameter (default TRUE).

DragonFly+ Topologies

Added SHIELD support for dfp2 routing engine for DragonFly+ topologies.

SM Allowed GUIDs List

This new capability enables the user to specify the list of GUIDs allowed to run SM in the fabric. When the list is provided, the master SM will avoid handover to ports that are not specified in the list.

To enable this feature use the "allowed_sm_guids" parameter. When set to "(null)", the feature disabled.

Limiting the Number of VLs for Long Distance Links

This new capability enables the user to set the maximum operational VL per port by a new file specified by the "device_configutarion_file" parameter in the OpenSM configuration file.

To provide per port configuration use the "device_configutarion_file".

For more information, see doc/device_configuration.md.

Send ClientReregister after Subnet Configuration

This new capability enables the user to send ClientReregister after subnet configuration to prevent the hosts from sending SA requests to the SM before the SM is ready to respond to them.
This feature can be controlled using the following parameters:
client_rereg_mode - Control modes of sending ClientRergister.
Supported values:

  • 0 - Do not send client re-registration.

  • 1 - Send client re-registration during LID assignment (previous default behavior).

  • 2 - [Default] Send client re-registration after routing and QoS configuration from link manager.

The new parameter replaces the depracated "no_clients_rereg" parameter.

kDOR Generalized Hypercube Engine

Added kDOR Generalized Hypercube engine.

General

  • Added the option to print a summary of AR and DragonFly+ supported switches to the log

  • Improved performance of NR lookup by LID

  • Changed the verbosity of port group creation messages to be in INFO level

  • Added new statistics counters to opensm-statistics.dump

  • Added the option to consider affinity when calculating number of cores

v5.8.1

Asymmetric trees

The feature is applicable to ar_updn and ar_ftree routing engines. It reduces congestion in asymmetric tree topologies with missing uplinks on leaf switches.
To enable/disable the feature, use the ar_tree_asymmetric_flow parameter. The supported values are:

  • 0 - Disable the feature (default).

  • 1 - Enable the feature using single AR subgroup.
    Note: Recommended for asymmetric tree topologies with 1000-2000 leaf switches.

  • 2 - Enable the feature using two AR subgroups.
    Note: Recommended for asymmetric tree topologies with less than 1000 leaf switches.

Selecting LID for Master SM

This feature prevents SM LID changes upon fail-over.
To set the LID for master SM, use the master_sm_lid parameter. The supported values are:

  • 0 - Disable the feature (default).

  • 1-0xBFFF - LID to set to SM port when in MASTER state.

Root GUIDs file for Dragonfly+Routing Engines

This feature enables root GUIDs file for Dragonfly+ topologyRouting Engines (dfp and dfp2).
To set the file with GUIDs of root switches of the topology use the root_guid_file parameter.

Dragonfly+Routing Engine

Added new routing engine (dfp2) for Dragonfly+topologies. This engine supports Dragonfly+topologies with any kind of tree topology islands. If the topology contains an island with more than 2 tree levels, the root GUIDs file, including the root switches of all Dragonfly+islands should be provided.
To add the dfp2 new Routing Engine, use the routing_engine parameter.

Maximum Operational VLs for Ca, Routers and Switches

This feature enables the user to configure different max_op_vls for CAs, Routers and Switches.
To set the maximum operational VLs per device type, use the following parameters:

  • max_op_vls_ca - Maximum operational VLs for CAs. When 0, use value max_op_vls. (default 0)

  • max_op_vls_rtr - Maximum operational VLs for routers. When 0, use value max_op_vls. (default 0)

  • max_op_vls_sw - Maximum operational VLs for switches. When 0, use value max_op_vls. (default 0)

“VL packing” for Dragonfly+and KDOR Routing Engines.

Added support for “VL packing” for Dragonfly+and KDOR routing engines. This feature reduces the maximum operational VLs for CAs to half of subnet max_op_vls when using dfp/dfp2/kdor_hc routing engines.
To enable/disable the feature, use the enable_vl_packing parameter.

The following is an example of “VL packing”:

  • enable_vl_packing set to TRUE

  • max_op_vls set to 3 (Enable 4 VLs)

  • max_op_vls_ca set to 2 (Use 2 VLs for CAs)

Support SRP target on HCAs with Socket-Direct architecture/Virtual Machines.

Enabled returning PortInfoRecord and NodeRecord for virtual ports and reporting virtual port capability changes To enable/disable the feature (default TRUE), use the enable_virt_rec_ext parameter.

General

  • Improved balancing of direct routes calculated for multi-port

  • Set HCA-grp in port groups parser to be an optional parameter

  • Added support for router alias GUIDs configuration for virtual ports

  • Added NDR speed port info capability bit to ib_types.h

  • Avoided sending client reregister to vport index 0

  • Extended rpg_byte_reset to 19 bits in Congestion Control

  • Enabled crashd by default

  • Added report auxiliary port state changes

  • OpenSM now overrides the attributes of IPoIB multicast groups loaded from the SADB with broadcast group

  • Limited the number of PortInfo and MEPI MADs sent per device simultaneously

  • Configured switch AR SL mask according to the ar_sl_mask configuration parameter

  • Updated GeneralInfo device IDs list to include NVIDIA Quantum-2 and future ConnectX family devices

  • Updated the man page with AR routing engines

  • Avoided exiting OpenSM when failing to bind to auxiliary port

v5.7.2

General

  • Added support for MCMR join/leave requests with default subnet prefix

  • Set Enabled AR SL mask on switches according to ar_sl_mask.

v5.7.1

Multiple ports

Allows MLNXSM to use multiple ports for sending Subnet configuration MADs.
Feature control parameters:

  • guid - Comma separated list of MLNXSM port GUIDs.
    First port GUID specifies primary port which used for Subnet Management (discovery, traps) and Subnet Administration.
    Additional port GUIDs are used for sending subnet configuration (SMP Set MADs).
    Configuration file example: guid 0x10001,0x10002

Extend router selection algorithm

Supports specifying hash function, seed and additional hash function arguments for router selection during path records calculation.
Feature control parameters:

  • rtr_selection_function - Hash function to be used by router selection algorithm.

  • Supported values - crc32 (default).

  • rtr_selection_seed - Seed for router selection algorithm. (default 0)

  • rtr_selection_algo_parameters - Comma separated list of parameters for router selection algorithm.
    Supported values: sgid, dgid. (default sgid, dgid)

LMC for routers and number of LIDs allowed per router for inter-subnet path records

Feature control parameters:

  • lids_per_rtr - Defines number of Router LIDs to be used for inter-subnet path records.
    When set to 0, MLNXSM will use number of LIDs per router according to global LMC. (default 0)
    When set to non-zero, MLNXSM will set LMC for router ports according to the value of this parameter (minimal N such that 2^N >= lids_per_rtr).
    If global LMC is not zero, lids_per_rtr is ignored.
    When lids_per_rtr is set to non-zero value, updn/ar_updn/chain with updn routing engines should be used.

Congestion Control

Feature control parameters:

  • mlnx_congestion_control - Enabled/Disable Mellanox Congestion Control.
    Supported values:
    0 - Do not configure congestion control (default).
    1 - Disable congestion control on the subnet.
    2 - Configure congestion control according to policy file.

  • congestion_control_policy_file - Path to congestion control policy file.

For additional information, please review congestion_control.md file provided with MLNXSM.

LIDs range in Routing Chains

Replaces path-bit qualifier in routing chain configuration by min-path-bit and max-path-bit qualifiers. (path-bit is still supported for backward compatibility).

Example of usage:

  • min-path-bit: 1

  • max-path-bit: 3

Controlling maximum number of MADs on wire per destination

Feature control parameters:

  • max_wire_smps_per_device - Number of MADs on the wire per device. (default 2)

Configuring service keys to service name

Service keys' configuring service names.
Feature control parameters:
service_name2key_map_file - Path to service name to service key map file.
File contains mapping from service name to service key which is specified in IPv6 format.
For example, map service name <SERVICE NAME> service key 0::1 by adding the following line to the file:
<SERVICE NAME> 0::1

General

  • Disabled the option to send PortInfo MADs to switch ports that did not change their state from the previous sweep.

  • Enabled Adaptive Routing for all SLs on switches.

  • Set limit to SMInfo dispatcher queue.

  • Improved performance of missing routes calculation for trees.

  • Improved performance of ar_updn and ar_ftree routing engines.

  • Improved performance of inter-subnet path record calculation.

  • Added log number of link resets by MLNXSM at the end of heavy sweep.

  • Disabled creating subnet LST file as default.
    Feature control parameters:

    • enable_lst_file - Controls dumping subnet LST file of the topology
      If set to TRUE, LST file is created after heavy sweep. (default FALSE)

  • Removed LMC support from verbosity bypass.

  • Enabled empty port groups file in routing chains.

  • Aligned index table columns in SMDB file.

  • UPDN LID tracking - Added the option to give precedence to exit ports leading to switch with lower total number of routes over exit ports leading to switch with less routes to the switch of the destination LID.
    Feature control parameters:

    • updn_lid_tracking_prefer_total_routes
      If set to TRUE, enable the feature. (default FALSE)

  • UPDN LID tracking - Improved routing algorithm to improve routing utilization and routes balancing.

  • UPDN LID tracking - Updated routing engine to support LMC.

Default Configuration Changes

  • Changed the default values of:

    • max_topologies_per_sw from 1 to 4

    • scatter_ports from 0 (disabled) to 8

    • log_flash from FALSE to TRUE

  • Disabled dumping subnet LST file by default.

Parameter Name

Status

Type

Description

5.10.0

adaptive_timeout_sl_mask

New

Number

Define a adaptive timeout SL mask of the port. Default 0xFFFF

routing_engine

Update

String

Changed default value from (null) to ar_updn

find_roots_color_algorithm

New

Boolean

Find root using coloring algorithm for tree based topologies.

Default is TRUE.

max_cas_on_spine

New

Boolean

The maximum number of CAs on a switch to allow considering it as a spine instead of a leaf by the routing algorithm.

hm_num_traps

Update

Number

Changed default value from 250 to 60.

hm_num_traps_period_secs

Update

Number

Changed default value from 60 to 90 seconds.

5.9.1

allowed_sm_guids

New

String

Define list of allowed SM port GUIDs

device_configuration_file

New

String

Path to device configuration file

client_rereg_mode

New

Number

Control sending ClientReregister to devices

max_rate_enum

New

Number

Define maximal supported rate in SA records

gmp_traps_threads_num

New

Number

Number of threads for processing GMP traps

get_mft_tables

New

Boolean

Enable/Disable reading MFT tables on first master sweep

routing_engine

Update

String

Support kdor-ghc for Generalized Hypercube routing engine

mepi_cache_enabled

Update

Boolean

Changed default from FALSE to TRUE

no_clients_rereg

Update

Boolean

Deprecated by client_rereg_mode

use_original_extended_sa_rates_only

Update

Boolean

Deprecated by max_rate_enum

dfp_down_up_turns_mode

Update

Number

Changed default from 0 to 2 (disable down/up turns)

routing_threads_num

Update

Number

Changed default value from 1 to 0

force_link_speed_ext

Update

Number

Support NDR speeds

5.8.1

max_wire_smps

Update

Number

Change default from 4 to 16

max_wire_smps2

Update

Number

Change default from 4 to 16

max_smps_timeut

Update

Number

Change default from 600000 to 300000 milliseconds

max_msg_fifo_timeout

Update

Number

Change default from 10000 to 5000 milliseconds

transaction_timeout

Update

Number

Change default from 200 to 100 milliseconds

enable_crashd

Update

Boolean

Change default from FALSE to TRUE

routing_engine

Update

Text

Support dfp2 routing engine

master_sm_lid

New

LID

LID for local SM when in MASTER state

enable_virt_rec_ext

New

Boolean

Enable PortInfoRecord/NodeRecord for virtual ports/nodes

ar_tree_asymmetric_flow

New

Number

AR Asymmetric trees max flow algorithm

max_op_vls_ca

New

Number

max_op_vl for CAs

max_op_vls_sw

New

Number

max_op_vl for switches

max_op_vls_rtr

New

Number

max_op_vl for routers

enable_vl_packing

New

Boolean

Enable VL packing

5.7.2

ar_sl_mask

Existing

Number

Modified behavior: Parameter controls AR SL mask both in switches and HCAs

5.7.1

enable_lst_file

New

Boolean

Control dumping subnet LST file of the topology

lids_per_rtr

New

Number

Control number of LIDs per router of inter-subnet path record

max_wire_smps_per_device

New

Number

Control maximum number of MADs on wire per device

service_name2key_map_file

New

Path

Path to service name to service key map file

rtr_selection_function

New

String

Hash function to be used by router selection algorithm

rtr_selection_seed

New

Number

Seed for router selection algorithm

rtr_selection_algo_parameters

New

String

Comma separated list of parameters for router selection algorithm

updn_lid_tracking_prefer_total_routes

New

Boolean

Control UPDN LID tracking exit port selection criteria

mlnx_congestion_control

New

Number

Control Mellanox Congestion Control enablement

congestion_control_policy_file

New

Path

Path to Congestion Control policy file

guid

Update

List

Changed the type from GUID to list of commas separated GUIDs

scatter_ports

Update

Number

Changed default value from 0 (disabled) to 8

log_flash

Update

Boolean

Changed default value from FALSE to TRUE

max_topologies_per_sw

Update

Number

Changed default value from 1 to 4

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.