NVIDIA Docs Hub Homepage NVIDIA Networking Networking Software Management Software NVIDIA UFM Enterprise User Manual v6.15.1 Known Issues History

Known Issues History

Ref #	Issue
N/A	Description: Execution of UFM Fabric Health Report (via UFM Web UI / REST API) will trigger ibdiagnet to use SLRG register which might cause some of the Switch and HCA's firmware to stuck and cause the HCA's ports to stay at "Init" state.
	Keywords:
	Discovered in Release: 6.14.0
3538640	Description: Fixed ALM plugin log rotate function.
	Keywords: ALM, Plugin, Log rotate
	Discovered in Release: 6.13.0
3532191	Description: Fixed UFM hanging (database is locked) after corrective restart of UFM health.
	Keywords: Hanging, Database, Locked
	Discovered in Release: 6.13.0
3555583	Description: Resolved REST API links' inability to return hostname for computer nodes.
	Keywords: REST API, Links, Hostname, Computer Nodes
	Discovered in Release: 6.12.1
3549795	Description: Fixed ufm_ha_cluster status to show DRBD sync status.
	Keywords: ufm_ha_cluster, DRBD, Sync Status
	Discovered in Release: 6.13.0
3549793	Description: Fixed UFM HA installation failure.
	Keywords: HA, Installation
	Discovered in Release: 6.13.0
3547517	Description: Fixed UFM logs REST API returning empty result when SM logs exist on the disk.
	Keywords: Logs, SM logs, Empty
	Discovered in Release: 6.11.0
3546178	Description: Fixed SHARP jobs failure when SHARP reservation feature is enabled.
	Keywords: SHARP, Jobs, Reservation
	Discovered in Release: 6.13.0
3541477	Description: Fixed UFM module temperature alerting on wrong thresholds.
	Keywords: Module Temperature, Alert Threshold
	Discovered in Release: 6.13.0
3191419	Description: Fixed UFM default session API returning port counter values as NULL.
	Keywords: Null, Port Counter, Value, API
	Discovered in Release: 6.9.0
3560659	Description: Fixed proper update in [MngNetwork] mtu_limit in gv.cfg when restarting UFM.
	Keywords: mtu_limit, gv.cfg, Update, UFM restart
	Discovered in Release: 6.13.1
3534374	Description: Fixed configure_ha_nodes.sh failure when deploying UFM6.13.x HA on Ubuntu22.04.
	Keywords: configure_ha_nodes.sh, HA, Ubuntu22.04
	Discovered in Release: 6.13.0
3496853	Description: Fixed daily report not being sent properly.
	Keywords: Daily Report, Failure
	Discovered in Release: 6.13.0
3469639	Description: Fixed REST RDMA server failure every couple of days, causing inability to retrieve ibdiagnet data.
	Keywords: REST RDMA, ibdiagnet
	Discovered in Release: 6.12.0
3455767	Description: Fixed incorrect combination of multiple devices in monitoring.
	Keywords: Monitoring, Incorrect combination
	Discovered in Release: 6.12.0
3511410	Description: Collect system dump for DGX host does not work due to missing sshpass utility.
	Workaround: Install sshpass utility on the DGX .
	Keywords: System Dump, DGX, sshpass utility
3432385	Description: UFM does not support HDR switch configured with hybrid split mode, where some of the ports are split and some are not.
	Workaround:  UFM can properly operate when all or none of the HDR switch ports are configured as split.
	Keywords: HDR Switch, Ports, Hybrid Split Mode
3472330	Description: On bare-metal high availability (HA), when initiating a UFM system dump from either the master or standby node, the collection process will not include the HA dumps (pacemaker and DRBD).
	Workaround:  To extract the HA system dump from bare-metal, run the following command from the master/standby nodes: Copy Copied! `/usr/bin/vsysinfo -S all -e -f /etc/ufm/ufm-ha-sysdump.conf -O /tmp/HA_sysdump` The extracted HA system dump are stored in `/tmp/HA_sysdump.gz.tar`
	Keywords: UFM System Dump, HA, Bare-Metal
3461658	Description: After the upgrade from UFM Enterprise v6.13.0 GA to UFM Enterprise v6.13.1 FUR, the network fast recovery path in `opensm.conf` is not automatically updated and remains with a null value (`fast_recovery_conf_file (null))`
	Workaround:  If you wish to enable the network fast recovery feature in UFM, make sure to set the appropriate path for the current fast recovery configuration file (`/opt/ufm/files/conf/opensm/fast_recovery.conf`) in the opensm.conf file located at `/opt/ufm/files/conf/opensm`, before starting UFM.
	Keywords:  Network fast recovery, Missing, Configuration
N/A	Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled on previous versions of UFM - prior to UFM v6.12.0)
	Workaround: Set "persistent_port_operation=false” in `gv.cfg` to use non-persistent (legacy) disabling or enabling of the port. UFM restart is required.
	Keywords: Disable, Enable, Port, Persistent
3346321	Description:  Failover to another port (multi-port SM) will not work as expected in case UFM was deployed as a docker container
	Workaround: Failover to another port (multi-port SM) works properly on UFM Bare-metal deployments
	Keywords: Failover to another port, Multi-port SM
3348587	Description: Replacement of defected nodes in the HA cluster does not work when PCS version is 0.9.x
	Workaround: N/A
	Keywords: Defected Node, HA Cluster, pcs version
3336769	Description: UFM-HA: In case the back-to-back interface is disabled or disconnected, the HA cluster will enter a split-brain state, and the "ufm_ha_cluster status" command will stop functioning properly.
	Workaround: To resolve the issue: Connect or enable the back-to-back interface Run Copy Copied! `pcs cluster start --all` Follow instructions in Split-Brain Recovery in HA Installation.
	Keywords: HA, Back-to-back Interface
3361160	Description: Upgrading UFM Enterprise from versions 6.8.0, 6.9.0 and 6.10.0 results in cleanup of UFM historical telemetry database (due to schema change). This means that the new telemetry data will be stored based on the new schema.
	Workaround: To preserve the historical telemetry database data while upgrading from UFM version 6.8.0, 6.9.0 and 6.10.0, perform the upgrade in two phases. First, upgrade to UFM v6.11.0, and then upgrade to the latest UFM version (UFM v6.12.0 or newer). It is important to note that the upgrade process may take longer depending on the size of the historical telemetry database.
	Keywords: UFM Historical Telemetry Database, Cleanup, Upgrade
3346321	Description: In some cases, when multiport SM is configured in UFM, a failover to the secondary node might be triggered instead of failover to the local available port
	Workaround: N/A
	Keywords: Multiport SM, Failover, Secondary port
3240664	Description: This software release does not support upgrading the UFM Enterprise version from the latest GA version (v6.11.0). UFM upgrade is supported in UFM Enterprise v6.9.0 and v6.10.0.
	Workaround: N/A
	Keywords: UFM Upgrade
3242332	Description: Upgrading MLNX_OFED uninstalls UFM
	Workaround: Upgrade UFM to a newer version (v6.11.0 or newer), then upgrade MLNX_OFED
	Keywords: MLNX_OFED, Uninstall, UFM
3237353	Description: Upgrading from UFM v6.10 removes MLNX_OFED crucial packages
	Workaround: Reinstall MLNX_OFED/UFM
	Keywords: MLNX_OFED, Upgrade, Packages
N/A	Description: Running UFM software with external UFM-SM is no longer supported
	Workaround: N/A
	Keywords: External UFM-SM
3144732	Description: By default, a managed Ubuntu 22 host will not be able to send system dump (sysdump) to a remote host as it does not include the sshpass utility.
	Workaround: In order to allow the UFM to generate system dump from a managed Ubuntu 22 host, install the sshpass utility prior to system dump generation.
	Keywords: Ubuntu 22, sysdump, sshpass
3129490	Description: HA uninstall procedure might get stuck on Ubuntu 20.04 due to multipath daemon running on the host.
	Workaround: Stop the multipath daemon before running the HA uninstall script on Ubuntu 20.04.
	Keywords: HA uninstall, multipath daemon, Ubuntu 20.04
3147196	Description: Running the upgrade procedure on bare metal Ubuntu 18.04 in HA mode might fail.
	Workaround: For instructions on how to apply the upgrade for bare metal Ubuntu 18.04, refer to High Availability Upgrade for Ubuntu 18.04 .
	Keywords: Upgrade, Ubuntu 18.04, Docker Container, failure
3145058	Description: Running upgrade procedure on UFM Docker Container in HA mode might fail.
	Workaround: For instructions on how to apply the upgrade for UFM Docker Container in HA, refer to Upgrade Container Procedure.
	Keywords: Upgrade, Docker Container, failure
3061449	Description: Upon upgrade of UFM all telemetry configurations will be overridden with the new telemetry configuration of the new UFM version.
	Workaround: If the telemetry configuration is set manually, the user should set up the configuration after upgrading the UFM for the changes to take effect. Telemetry manual configuration should be set on the following telemetry configuration file right after UFM upgrade: `/opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini.`
	Keywords: Telemetry, configuration, upgrade, override.
3053455	Description: UFM “Set Node Description” action for unmanaged switches is not supported for Ubuntu18 deployments
	Workaround: N/A
	Keywords: Set Node Description, Ubuntu18
3053455	Description: UFM Installations are not supported on RHEL8.X or CentOS8.X
	Workaround: N/A
	Keywords: Install, RHEL8, CentOS8
3052660	Description: UFM monitoring mode is not working
	Workaround: In order to make UFM work in monitoring mode, please edit telemetry configuration file: `/opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini` Search for `arg_12` and set empty value: `arg_12=` Restarting the UFM will run the UFM in monitoring mode. Before starting the UFM make sure to set: `monitoring_mode = yes` in gv.cfg
	Keywords: Monitoring, mode
3054340	Description: Setting non-existing log directory will fail UFM to start
	Workaround: Make sure to set a valid (existing) log directory when setting this parameter (gv.cfgàlog_dir)
	Keywords: Log, Dir, fail, start
-	Description: Restoring HA standby node and configuring UFM HA with external UFM-Subnet Managers are not supported on Ubuntu bare-metal deployments
	Workaround: N/A
	Keywords: HA standby node, bare-metal
2887364	Description: After upgrading to UFM6.8, in case UFM failed over to the secondary node, trying to get cable information for selected port will fail.
	Workaround: On the secondary UFM node, copy the following files to /usr/bin/ folder: /usr/flint /usr/flint_ext /usr/mlxcables /usr/mlxcables_ext /usr/mlxlink /usr/mlxlink_ext trying to get cable information on the secondary UFM node should work now.
	Keywords: upgrade, failover, cable information
2784560	Description: Intentional stop for master container and start it again or reboot of master server will damage the HA failover option
	Workaround: manually restart UFM cluster
	Keywords: UFM Container; Reboot, Failover
2872513	Description: after rebooting master container, Failover will be triggered twice (once to the standby and then back again to the master container)
	Workaround: N/A
	Keywords: UFM Container, reboot, failover
2863388	Description: Fail to get cables info for NDR Split Port.
	Workaround: N/A
	Keywords: Cable, NDR, Split
N/A	Description: In case of using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade)
	Workaround: N/A
	Keywords: SM, mkey per port
2702950	Description: Internet connection is required to download and install SQLite on the old container during software the upgrade process.
	Workaround: N/A
	Keywords: Container; upgrade
2694977	Description: Adding a large number of devices (~1000) to a group or a logical server, on large scale setup takes ~2 minutes.
	Workaround: N/A
	Keywords: Add device; group; logical server; large scale
2710613	Description: Periodic topology compare will not report removed nodes if the last topology change included only removed nodes.
	Workaround: N/A
	Keywords: Topology comparison
2698055	Description: UFM, configured to work with telemetry for collecting historical data, is limited to work only with the configured HCA port. If this port is part of a bond interface and a failure occurs on the port, collection of telemetry data via this port stops.
	Workaround: Reconfigure telemetry with the new active port and restart it within UFM.
	Keywords: Telemetry; history; bond; failure
2705974	Description: If new ports are added after UFM startup, the default session REST API (GET /ufmRest/monitoring/session/0/data) will not include port statistics for the newly added ports.
	Workaround: Reset the main UFM. For UFM standalone – `/etc/init.d/ufmd model_restart` For UFM HA – `/etc/init.d/ufmha model_restart`
	Keywords: Default session; REST API; missing ports
2714738	Description: Intentional stop for master container and start it again or reboot of master server will damage the HA failover option
	Workaround: manually Restart UFM cluster
	Keywords: UFM Container; Reboot, Failover
2872513	Description: after rebooting master container, Failover will be triggered twice (once to the standby and then back again to the master container)
	Workaround: N/A
	Keywords: UFM Container, reboot, failover
2863388	Description: Fail to get cables info for NDR Splitted Port.
	Workaround: N/A
	Keywords: Cable, NDR, Split
N/A	Description: In case of using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade)
	Workaround: N/A
	Keywords: SM, mkey per port,
–	Description: The UFM which is configured to work with telemetry for collecting historical data, is limited to work only with the configured HCA port - if this port is part of the bond interface and failure occurs, all telemetry data via this port will be stopped.
	Workaround: If a historical telemetry port is apart of the bond and a failure occurs, user should reconfigure the telemetry with a new active port and restart it within UFM.
	Keywords: telemetry, history, bond, failure
	Discovered in release: 6.7
2459320	Description: Docker upgrade to UFM6.6.1 from UFM6.6.0 is not supported.
	Workaround: N/A
	Keywords: Docker; upgrade
	Discovered in release: 6.6.1
-	Description: SHARP Aggregation Manager over UCX is not supported.
	Workaround: N/A
	Keywords: UCX; SHARP AM
	Discovered in release: 6.6.1
2288038	Description: When the user try to collect system dump for UFM Appliance host, the job will be completed with an error with the following summary: "Running as a none root user Please switch to root user (super user) and run again."
	Workaround: N/A
	Keywords: System dump, UFM Appliance host
	Discovered in release: 6.5.2
2100564	Description: For modular dual-management switch systems, switch information is not presented correctly if the primary management module fails and the secondary takes over.
	Workaround: To avoid corrupted switch information, it is recommended to manually set the virtual IP address (box IP address) for the switch as the managed switch IP address (manual IP address) within UFM.
	Keywords: Modular switch, dual-management, virtual IP, box IP
	Discovered in release: 6.4.1
2135272	Description: UFM does not support hosts equipped with multiple HCAs of different types (e.g. a host with ConnectX®-3 and ConnectX-4/5/6) if multi-NIC grouping is enabled (i.e. multinic_host_enabled = true).
	Workaround: All managed hosts must contain HCAs of the same type (either using ConnectX-3 HCAs or use ConnectX-4/5/6 HCAs).
	Keywords: Multiple HCAs
	Discovered in release: 6.4.1
2063266	Description: Firmware upgrade for managed hosts with multiple HCAs is not supported. That is, it is not possible to perform FW upgrade for a specific host HCA.
	Workaround: Running software (MLNX_OFED) upgrade on that host will automatically upgrade all the HCAs on this host with the firmware bundled as part of this software package.
	Keywords: FW upgrade, multiple HCAs
	Discovered in release: 6.4.1
-	Description: Management PKey configuration (e.g. MTU, SL) can be performed only using PKey management interface (via GUI or REST API).
	Workaround: N/A
	Keywords: PKey, Management PKey, REST API
	Discovered in release: 6.4
2092885	Description: UFM Agent is not supported for SLES15 and RHEL8/CentOS8.
	Workaround: N/A
	Keywords: UFM Agent
	Discovered in release: 6.4
-	Description: CentOS 8.0 does not support IPv6.
	Workaround: N/A
	Keywords: IPv6
	Discovered in release: 6.4
1895385	Description: QoS parameters (mtu, sl and rate_limit) change does not take effect unless OpenSM is restarted.
	Workaround: N/A
	Keywords: QoS, PKey, OpenSM
	Discovered in release: 6.3
-	Description: Logical Server Auditing feature is supported on RedHat 7.x operating systems only.
	Workaround: N/A
	Keywords: Logical Server, auditing, OS
	Discovered in release: 5.9
-	Description: Configuration from lossy to lossless requires device reset.
	Workaround: Reboot all relevant devices after changing behavior from lossy to lossless.
	Keywords: Lossy configuration