Bug Fixes

VersionDescription

5.17.0

  • Fixed the process of marking peers of isolated ports as flapping ports
  • Fixed root detection algorithm to support additional cases of missing links
  • Fixed the process of sending N2N MADs to isolated devices
  • Fixed the process of sending CC MADs to isolated hosts
  • Fixed an issue that caused repeated heavy sweep when isolating HCA with two ports with the same node GUID
  • Fixed NDR port labels

5.16.0

  • Fixed a race condition when parsing configuration file that can result in SM crash
  • Fixed an issue with reconfiguring switch after removed from held-back list
  • Fixed a redundant heavy sweep when suspecting a port is unhealthy
  • Fixed the handling process of switches with all port marked as 'no_discover'
  • Fixed an issue with writing 'unknown vendor' to screen

5.15.0

  • Fixed the reporting of PortInfo validation failure for rebooted HCA ports
  • Fixed the writing of VS and CC MADs in debug verbosity
  • Fixed an issue related to sending SL2VL MADs for unhealthy ports
  • Fixed an issue related to the initiating heavy sweep process when reporting new unhealthy port

5.14.0

  • Fixed a crash that occurred when Incremental Multicast Routing was enabled
  • Fixed an issue related to the enabled PLFT2 on DF+2 when the "dive-ins" are not permitted
  • Marked ports that did not respond to the NI as unhealthy instead of entire node
  • Fixed root detection algorithm in trees with missing links
  • Fixed root detection algorithm in DF+ with roots without leaves

5.13.0

  • Fixed an issue related to removing ServiceRecords when the port is disconnected.
  • Set threads affinity according to the scheduler affinity.
  • Enabled dumping SMDB file when ucast cache feature is enabled.
  • Fixed event reporting to untrusted subscribers.
  • Fixed an issue related to creating service records with P_Key 0.
  • Fixed the handling of routers marked as "unhealthy" when using fat-tree routing engine.
  • Fixed an issue related to creating a dump files directory when it did not exist on startup.

5.12.0

  • Fixed a crash that occurred when drop_subscr_on_report_fail was enabled.
  • Fixed a case that caused FRN to fail when there were isolated/heldback switches.
  • Fixed a memory leak when changed the list of routing engines during runtime.
  • Fixed an issue that prevented from ports to be directly activated in INIT state.
  • Fixed an issue that prevented activating virtual ports on first time master sweep when running with --once.
  • Fixed a memory leak when parsing QoS policy file with errors.
  • Fixed an issue that prevented the incrementation of of outstanding AN2AN/VS/CC MADs when no response was expected.
  • Added support for routers in FTREE routing engine.
  • Fixed an issue related to AR LFT in trees that had entries with FREE state and empty group 0.

5.11.0

  • Fixed unconditional jump on uninitialized value when in dfp2 when ar_sl_mask is set to 0.
  • Fixed a case of duplicated LIDs when persistent SM LID feature is enabled.
  • Fixed invalidating ucast cache when discovering faulty switch.
  • Fixed a crash when detecting two ports of the same node with different port GUID but the same port number.
  • Fixed traps 1310 and 1311 (duplicate GUIDs) type to 'security'.
  • Fixed reporting trap 1312 to UFM.

5.10.0

  • Fixed a crash that occurred during a race between the LFT record get query and routing configuration.
  • Fixed a non-generic notices statistics counters in the dump file.
  • Fixed the postponing isolation and reporting process of the noisy ports.
  • Fixed an issue related to the selecting of the held back/isolated switches as roots for multicast trees.
  • Fixed an issue that caused the unresponsive links to to remain in Active state.
  • Fixed an issue that affected the writing of invalid AN2AN links to SMDB dump file.
  • Fixed the IPoIB traffic loss after changing the subnet prefix and loading the MC groups from SADB upon SM restart.
  • Fixed the SM build on Debian with libibumad from rdma-core.
  • Fixed the way how port capability changes are handled during runtime.
  • Fixed an incorrect endianness issue in error log message 0F29.
  • Fixed an incorrect log message when enabling SHARP on the device.
  • Fixed the statistics counters race condition with SM multi port.
  • RFixed rewriting of the statistics file when the existing file had different header than the current. In case the previous header is different from the current, a backup of the old file is created as well as the updated statistics file.

5.9.1

  • Fixed a crash incident when isolating the switch using:

    • the "held_back_sw_guid" file while running SM with updn/ar_updn
    • using GUIDs order file with a port group that includes HCAs that are connected to a held-back switch
  • Fixed an issue that resulted in breaking routing for virtual port LIDs upon failover/restart
  • Fixed an issue that caused ar_ftree to create non-credit loop free routing between IO nodes
  • Fixed an issue that resulted in continuation of the discovery stage during subnet configuration stage
  • Fixed an issue that missed getting MEPI after switch reset
  • Fixed multicast group leak when handling leave of SendOnlyFullMembers of multicast groups
  • Fixed a leak when spoofing notice 144 for virtual ports

5.8.1

  • Enabled SA requests with default subnet prefix in GRH on subnet with non-default subnet prefix
  • Fixed a crash when processing virtual ports after aborted heavy sweep
  • Fixed a wrong direct route for GeneralInfo MADs after coming out-of-standby
  • Fixed s crash in UPDN LID tracking that happened when multithreading was enabled
  • Fixed file descriptor leakage when running with crashd
  • Fixed an issue that resulted in setting default pkey at index 0 on invalid partitions.conf
  • Fixed an issue that prevented setting ar_sl_mask on hosts when running with armgr plugin
  • Freed alias GUIDs resources when deleting virtual port object
  • Fixed checking 2x link width capability
  • Enabled handling MCMemberRecord request with default subnet prefix on subnet with non-default subnet prefix

5.7.1

  • Fixed memory overflow upon virtual ports removal from the Subnet when using Adaptive Routing.
  • Fixed handling ‘;’ and ‘:’ in nodes names in port groups policy file parser.
  • Fixed missing routes-to-routers after recovery the routing engine in Dragonfly+ .
  • Fixed port_search_order usage when LMC is enabled.
  • Fixed SA LinkRecords and MultipathRecords LMC support.
  • Fixed partition checking for LinkRecord and PortInfoRecord queries.
  • Fixed dedicated groups calculation for switches with ANs when FRN enabled.
  • Fixed router support in port groups.
  • Fixed an issue that prevented SADB dumping when updating service records.

© Copyright 2023, NVIDIA. Last updated on Nov 7, 2023.