Known Issue History

Ref #

Issue

N/A

Description: Execution of UFM Fabric Health Report (via UFM Web UI / REST API) will trigger ibdiagnet to use SLRG register, which might cause some of the Switch and HCA's firmware to get stuck and cause the HCA's ports to stay at "Init" state.

Keywords: UFM Fabric Health Report; SLRG; Stuckness

Discovered in Release: 1.5.0

3511410

Description: Collect system dump for DGX host does not work due to missing sshpass utility.

Workaround: Install sshpass utility on the DGX .

Keywords: System Dump, DGX, sshpass utility

3432385

Description: UFM does not support HDR switch configured with hybrid split mode, where some of the ports are split and some are not.

Workaround:  UFM can properly operate when all or none of the HDR switch ports are configured as split.

Keywords: HDR Switch, Ports, Hybrid Split Mode

3461658

Description: After the upgrade from UFM Enterprise Appliance v1.4.0 GA to UFM Enterprise Appliance v1.4.1 FUR, the network fast recovery path in opensm.conf is not automatically updated and remains with a null value (fast_recovery_conf_file (null))

Workaround:  If you wish to enable the network fast recovery feature in UFM, make sure to set the appropriate path for the current fast recovery configuration file (/opt/ufm/files/conf/opensm/fast_recovery.conf) in the opensm.conf file located at /opt/ufm/files/conf/opensm, before starting UFM.

Keywords:  Network fast recovery, Missing, Configuration

N/A

Description: Upgrading the UFM Enterprise Appliance SW while upgrading the UFM Enterprise Appliance OS is not supported.

Workaround: Do not use the --appliance-sw-upgrade flag while upgrading the UFM Enterprise Appliance OS. Alternatively, upgrade the UFM Enterprise Appliance SW as described in Software Upgrade

Keywords: SW Upgrade; OS Upgrade, --appliance-sw-upgrade

3473600

Description: The UFM Enterprise service is enabled while upgrading the UFM Enterprise Appliance SW on HA mode.

Workaround: Disable the UFM Enterprise service after the upgrade in HA mode by running the following command:

Copy
Copied!
            

systemctl disable ufm-enterprise.service

Keywords: SW Upgrade, HA Mode

3361160

Description: Upgrading UFM Enterprise Appliance from versions 1.3.0, 1.2.0 and 1.1.0 results in cleanup of UFM historical telemetry database (due to schema change). This means that the new telemetry data will be stored based on the new schema.

Workaround: To preserve the historical telemetry database data while upgrading from UFM Enterprise Appliance version 1.3.0, 1.2.0 and 1.1.0, perform the upgrade in two phases. First, upgrade to UFM Enterprise Appliance v1.2.0, and then upgrade to the latest UFM version (UFM v1.3.0 or newer). It is important to note that the upgrade process may take longer depending on the size of the historical telemetry database.

Keywords: UFM Historical Telemetry Database, Cleanup, Upgrade

3346321

Description: In some cases, when multiport SM is configured in UFM, a failover to the secondary node might be triggered instead of failover to the local available port

Workaround: N/A

Keywords: Multiport SM, Failover, Secondary port

N/A

Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled in previous versions of UFM Enterprise Appliance v1.3.0)

Workaround: Set "persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of the port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent

3346321

Description:  Failover to another port (multi-port SM) will not work as expected in case UFM was deployed as a docker container

Workaround: Failover to another port (multi-port SM) works properly on UFM Bare-metal deployments

Keywords: Failover to another port, Multi-port SM

348587

Description: Replacement of defected nodes in the HA cluster does not work when PCS version is 0.9.x

Workaround: N/A

Keywords: Defected Node, HA Cluster, pcs version

3336769

Description: UFM-HA: If the back-to-back interface is disabled or disconnected, the HA cluster will enter a split-brain state, and the "ufm_ha_cluster status" command will stop functioning properly.

Workaround: To resolve the issue:

  1. Connect or enable the back-to-back interface

  2. Run

    Copy
    Copied!
                

    pcs cluster start --all

  3. Follow instructions in Split-Brain Recovery in HA Installation.

Keywords: HA, Back-to-back Interface

N/A

Description: Running UFM software with external UFM-SM is no longer supported

Workaround: N/A

Keywords: External UFM-SM

© Copyright 2023, NVIDIA. Last updated on Nov 21, 2023.