Known Issue History

NVIDIA UFM Enterprise Appliance Software User Manual v1.8.0

Ref #



Description: In congestion control, the cc-policy.conf file remains unchanged following the upgrade of the container version (with no changes made by the user)

Keywords: Congestion Control, cc-policy.conf, Upgrade, Container

Workaround: On the host, run the command:

docker exec -it ufm cp /opt/ufm/skeleton/conf/opensm/cc-policy.conf /opt/ufm/files/conf/opensm/cc-policy.conf

Discovered in Release: 1.7.0


Description: : Upon UFM startup, an empty temporary folder will be created at /tmp folder every 10 minutes (due to periodic telemetry status check)

Keywords: Empty folder, temporary, /tmp

Workaround: Change instances_sessions_compatibility_interval parameter in gv.cfg to 30 minutes

Discovered in Release: v1.6.0


Description: Modifying the mtu_limit parameter for [MngNetwork] in gv.cfg does not accurately reflect changes upon restarting UFM.

Keywords: mtu_limit, MngNetwork, gv.cfg, UFM restart

Workaround: UFM needs to be restarted twice in order for the changes to take effect.

Discovered in Release: v1.6.0


Description: The Logs API temporarily returns an empty response when SM log file contains messages from both previous year (2023) and current year (2024).

Keywords: Logs API, Empty response, Logs file

Workaround: N/A (issue will be automatically resolved after the problematic SM log file, which include messages from 2023 and 2024 years, will be rotated)

Discovered in Release: v1.6.0


Description: After remanufacturing the UFM Enterprise Appliance from an ISO file as described in Appendix - Deploying UFM Appliance from an ISO File, rebooting or power cycling the host in High-Availability (HA) mode results in the unsuccessful start of the HA services.

Workaround: Change the crontab option in UFM Enterprise Appliance via the OS shell #crontab -e:


@reboot /usr/sbin/netplan apply



@reboot sleep 240 && /sbin/ip link set up dev idrac

Keywords: Reboot; HA; Power Cycle

Discovered in Release: 1.6.0


Description: Execution of UFM Fabric Health Report (via UFM Web UI / REST API) will trigger ibdiagnet to use SLRG register, which might cause some of the Switch and HCA’s firmware to get stuck and cause the HCA’s ports to stay at “Init” state.

Keywords: UFM Fabric Health Report; SLRG; Stuckness

Discovered in Release: 1.5.0


Description: Collect system dump for DGX host does not work due to missing sshpass utility.

Workaround: Install sshpass utility on the DGX .

Keywords: System Dump, DGX, sshpass utility


Description: UFM does not support HDR switch configured with hybrid split mode, where some of the ports are split and some are not.

Workaround:  UFM can properly operate when all or none of the HDR switch ports are configured as split.

Keywords: HDR Switch, Ports, Hybrid Split Mode


Description: After the upgrade from UFM Enterprise Appliance v1.4.0 GA to UFM Enterprise Appliance v1.4.1 FUR, the network fast recovery path in opensm.conf is not automatically updated and remains with a null value (fast_recovery_conf_file (null))

Workaround:  If you wish to enable the network fast recovery feature in UFM, make sure to set the appropriate path for the current fast recovery configuration file (/opt/ufm/files/conf/opensm/fast_recovery.conf) in the opensm.conf file located at /opt/ufm/files/conf/opensm, before starting UFM.

Keywords:  Network fast recovery, Missing, Configuration


Description: Upgrading the UFM Enterprise Appliance SW while upgrading the UFM Enterprise Appliance OS is not supported.

Workaround: Do not use the --appliance-sw-upgrade flag while upgrading the UFM Enterprise Appliance OS. Alternatively, upgrade the UFM Enterprise Appliance SW as described in Software Upgrade

Keywords: SW Upgrade; OS Upgrade, --appliance-sw-upgrade


Description: The UFM Enterprise service is enabled while upgrading the UFM Enterprise Appliance SW on HA mode.

Workaround: Disable the UFM Enterprise service after the upgrade in HA mode by running the following command:


systemctl disable ufm-enterprise.service

Keywords: SW Upgrade, HA Mode


Description: Upgrading UFM Enterprise Appliance from versions 1.3.0, 1.2.0 and 1.1.0 results in cleanup of UFM historical telemetry database (due to schema change). This means that the new telemetry data will be stored based on the new schema.

Workaround: To preserve the historical telemetry database data while upgrading from UFM Enterprise Appliance version 1.3.0, 1.2.0 and 1.1.0, perform the upgrade in two phases. First, upgrade to UFM Enterprise Appliance v1.2.0, and then upgrade to the latest UFM version (UFM v1.3.0 or newer). It is important to note that the upgrade process may take longer depending on the size of the historical telemetry database.

Keywords: UFM Historical Telemetry Database, Cleanup, Upgrade


Description: In some cases, when multiport SM is configured in UFM, a failover to the secondary node might be triggered instead of failover to the local available port

Workaround: N/A

Keywords: Multiport SM, Failover, Secondary port


Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled in previous versions of UFM Enterprise Appliance v1.3.0)

Workaround: Set “persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of the port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent


Description:  Failover to another port (multi-port SM) will not work as expected in case UFM was deployed as a docker container

Workaround: Failover to another port (multi-port SM) works properly on UFM Bare-metal deployments

Keywords: Failover to another port, Multi-port SM


Description: Replacement of defected nodes in the HA cluster does not work when PCS version is 0.9.x

Workaround: N/A

Keywords: Defected Node, HA Cluster, pcs version


Description: UFM-HA: If the back-to-back interface is disabled or disconnected, the HA cluster will enter a split-brain state, and the “ufm_ha_cluster status” command will stop functioning properly.

Workaround: To resolve the issue:

  1. Connect or enable the back-to-back interface

  2. Run


    pcs cluster start --all

  3. Follow instructions in Split-Brain Recovery in HA Installation.

Keywords: HA, Back-to-back Interface


Description: Running UFM software with external UFM-SM is no longer supported

Workaround: N/A

Keywords: External UFM-SM

© Copyright 2024, NVIDIA. Last updated on May 8, 2024.