Bug Fixes

MLNXSM (InfiniBand Subnet Manager) Utility Release Notes v5.19

Version

Description

5.19.0

  • Fixed an issue in root detection to identify roots on topologies with two switches

  • Fixed an issue with setting transaction_retries to 0

  • Fixed an issue with not resending RouterLIDTable MADs after timeouts

  • Fixed a logging issue when handling invalid MC join requests

5.18.0

  • Fixed an issue with supporting DF+ topologies with more than 341 leaf switches in dfp2 routing engine

  • Fixed an issue with assigning empty AR groups between AR group 0

  • Fixed an issue in root detection algorithm related to selecting non-optimal roots

  • Fixed an issue of not resending routing tables to switches after switch restarts

  • Fixed an issue with incorrect port states in SMDB file of sub-topologies

5.17.0

  • Fixed the process of marking peers of isolated ports as flapping ports

  • Fixed root detection algorithm to support additional cases of missing links

  • Fixed the process of sending N2N MADs to isolated devices

  • Fixed the process of sending CC MADs to isolated hosts

  • Fixed an issue that caused repeated heavy sweep when isolating HCA with two ports with the same node GUID

  • Fixed NDR port labels

  • Fixed an issue related to SM not resending MEPI after timeout

5.16.0

  • Fixed a race condition when parsing configuration file that can result in SM crash

  • Fixed an issue with reconfiguring switch after removed from held-back list

  • Fixed a redundant heavy sweep when suspecting a port is unhealthy

  • Fixed the handling process of switches with all port marked as 'no_discover'

  • Fixed an issue with writing 'unknown vendor' to screen

5.15.0

  • Fixed the reporting of PortInfo validation failure for rebooted HCA ports

  • Fixed the writing of VS and CC MADs in debug verbosity

  • Fixed an issue related to sending SL2VL MADs for unhealthy ports

  • Fixed an issue related to the initiating heavy sweep process when reporting new unhealthy port

5.14.0

  • Fixed a crash that occurred when Incremental Multicast Routing was enabled

  • Fixed an issue related to the enabled PLFT2 on DF+2 when the "dive-ins" are not permitted

  • Marked ports that did not respond to the NI as unhealthy instead of entire node

  • Fixed root detection algorithm in trees with missing links

  • Fixed root detection algorithm in DF+ with roots without leaves

5.13.0

  • Fixed an issue related to removing ServiceRecords when the port is disconnected.

  • Set threads affinity according to the scheduler affinity.

  • Enabled dumping SMDB file when ucast cache feature is enabled.

  • Fixed event reporting to untrusted subscribers.

  • Fixed an issue related to creating service records with P_Key 0.

  • Fixed the handling of routers marked as "unhealthy" when using fat-tree routing engine.

  • Fixed an issue related to creating a dump files directory when it did not exist on startup.

5.12.0

  • Fixed a crash that occurred when drop_subscr_on_report_fail was enabled.

  • Fixed a case that caused FRN to fail when there were isolated/heldback switches.

  • Fixed a memory leak when changed the list of routing engines during runtime.

  • Fixed an issue that prevented from ports to be directly activated in INIT state.

  • Fixed an issue that prevented activating virtual ports on first time master sweep when running with --once.

  • Fixed a memory leak when parsing QoS policy file with errors.

  • Fixed an issue that prevented the incrementation of of outstanding AN2AN/VS/CC MADs when no response was expected.

  • Added support for routers in FTREE routing engine.

  • Fixed an issue related to AR LFT in trees that had entries with FREE state and empty group 0.

5.11.0

  • Fixed unconditional jump on uninitialized value when in dfp2 when ar_sl_mask is set to 0.

  • Fixed a case of duplicated LIDs when persistent SM LID feature is enabled.

  • Fixed invalidating ucast cache when discovering faulty switch.

  • Fixed a crash when detecting two ports of the same node with different port GUID but the same port number.

  • Fixed traps 1310 and 1311 (duplicate GUIDs) type to 'security'.

  • Fixed reporting trap 1312 to UFM.

5.10.0

  • Fixed a crash that occurred during a race between the LFT record get query and routing configuration.

  • Fixed a non-generic notices statistics counters in the dump file.

  • Fixed the postponing isolation and reporting process of the noisy ports.

  • Fixed an issue related to the selecting of the held back/isolated switches as roots for multicast trees.

  • Fixed an issue that caused the unresponsive links to to remain in Active state.

  • Fixed an issue that affected the writing of invalid AN2AN links to SMDB dump file.

  • Fixed the IPoIB traffic loss after changing the subnet prefix and loading the MC groups from SADB upon SM restart.

  • Fixed the SM build on Debian with libibumad from rdma-core.

  • Fixed the way how port capability changes are handled during runtime.

  • Fixed an incorrect endianness issue in error log message 0F29.

  • Fixed an incorrect log message when enabling SHARP on the device.

  • Fixed the statistics counters race condition with SM multi port.

  • RFixed rewriting of the statistics file when the existing file had different header than the current. In case the previous header is different from the current, a backup of the old file is created as well as the updated statistics file.

5.9.1

  • Fixed a crash incident when isolating the switch using:

    • the "held_back_sw_guid" file while running SM with updn/ar_updn

    • using GUIDs order file with a port group that includes HCAs that are connected to a held-back switch

  • Fixed an issue that resulted in breaking routing for virtual port LIDs upon failover/restart

  • Fixed an issue that caused ar_ftree to create non-credit loop free routing between IO nodes

  • Fixed an issue that resulted in continuation of the discovery stage during subnet configuration stage

  • Fixed an issue that missed getting MEPI after switch reset

  • Fixed multicast group leak when handling leave of SendOnlyFullMembers of multicast groups

  • Fixed a leak when spoofing notice 144 for virtual ports

5.8.1

  • Enabled SA requests with default subnet prefix in GRH on subnet with non-default subnet prefix

  • Fixed a crash when processing virtual ports after aborted heavy sweep

  • Fixed a wrong direct route for GeneralInfo MADs after coming out-of-standby

  • Fixed s crash in UPDN LID tracking that happened when multithreading was enabled

  • Fixed file descriptor leakage when running with crashd

  • Fixed an issue that resulted in setting default pkey at index 0 on invalid partitions.conf

  • Fixed an issue that prevented setting ar_sl_mask on hosts when running with armgr plugin

  • Freed alias GUIDs resources when deleting virtual port object

  • Fixed checking 2x link width capability

  • Enabled handling MCMemberRecord request with default subnet prefix on subnet with non-default subnet prefix

5.7.1

  • Fixed memory overflow upon virtual ports removal from the Subnet when using Adaptive Routing.

  • Fixed handling ‘;’ and ‘:’ in nodes names in port groups policy file parser.

  • Fixed missing routes-to-routers after recovery the routing engine in Dragonfly+ .

  • Fixed port_search_order usage when LMC is enabled.

  • Fixed SA LinkRecords and MultipathRecords LMC support.

  • Fixed partition checking for LinkRecord and PortInfoRecord queries.

  • Fixed dedicated groups calculation for switches with ANs when FRN enabled.

  • Fixed router support in port groups.

  • Fixed an issue that prevented SADB dumping when updating service records.

© Copyright 2024, NVIDIA. Last updated on May 4, 2024.