Known Issues History

NVIDIA UFM Enterprise User Manual v6.12.1

Ref #

Issue

N/A

Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled on previous versions of UFM - prior to UFM v6.12.0)

Workaround: Set "persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of the port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent

3346321

Description:  Failover to another port (multi-port SM) will not work as expected in case UFM was deployed as a docker container

Workaround: Failover to another port (multi-port SM) works properly on UFM Bare-metal deployments

Keywords: Failover to another port, Multi-port SM

3348587

Description: Replacement of defected nodes in the HA cluster does not work when PCS version is 0.9.x

Workaround: N/A

Keywords: Defected Node, HA Cluster, pcs version

3336769

Description: UFM-HA: In case the back-to-back interface is disabled or disconnected, the HA cluster will enter a split-brain state, and the "ufm_ha_cluster status" command will stop functioning properly.

Workaround: To resolve the issue:

  1. Connect or enable the back-to-back interface

  2. Run

    Copy
    Copied!
                

    pcs cluster start --all

  3. Follow instructions in Split-Brain Recovery in HA Installation.

Keywords: HA, Back-to-back Interface

3240664

Description: This software release does not support upgrading the UFM Enterprise version from the latest GA version (v6.11.0). UFM upgrade is supported in UFM Enterprise v6.9.0 and v6.10.0.

Workaround: N/A

Keywords: UFM Upgrade

3242332

Description: Upgrading MLNX_OFED uninstalls UFM

Workaround: Upgrade UFM to a newer version (v6.11.0 or newer), then upgrade MLNX_OFED

Keywords: MLNX_OFED, Uninstall, UFM

3237353

Description: Upgrading from UFM v6.10 removes MLNX_OFED crucial packages

Workaround: Reinstall MLNX_OFED/UFM

Keywords: MLNX_OFED, Upgrade, Packages

N/A

Description: Running UFM software with external UFM-SM is no longer supported

Workaround: N/A

Keywords: External UFM-SM

3144732

Description: By default, a managed Ubuntu 22 host will not be able to send system dump (sysdump) to a remote host as it does not include the sshpass utility.

Workaround: In order to allow the UFM to generate system dump from a managed Ubuntu 22 host, install the sshpass utility prior to system dump generation.

Keywords: Ubuntu 22, sysdump, sshpass

3129490

Description: HA uninstall procedure might get stuck on Ubuntu 20.04 due to multipath daemon running on the host.

Workaround: Stop the multipath daemon before running the HA uninstall script on Ubuntu 20.04.

Keywords: HA uninstall, multipath daemon, Ubuntu 20.04

3147196

Description: Running the upgrade procedure on bare metal Ubuntu 18.04 in HA mode might fail.

Workaround: For instructions on how to apply the upgrade for bare metal Ubuntu 18.04, refer to High Availability Upgrade for Ubuntu 18.04 .

Keywords: Upgrade, Ubuntu 18.04, Docker Container, failure

3145058

Description: Running upgrade procedure on UFM Docker Container in HA mode might fail.

Workaround: For instructions on how to apply the upgrade for UFM Docker Container in HA, refer to Upgrade Container Procedure.

Keywords: Upgrade, Docker Container, failure

3061449

Description: Upon upgrade of UFM all telemetry configurations will be overridden with the new telemetry configuration of the new UFM version.

Workaround: If the telemetry configuration is set manually, the user should set up the configuration after upgrading the UFM for the changes to take effect.
Telemetry manual configuration should be set on the following telemetry configuration file right after UFM upgrade: /opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini.

Keywords: Telemetry, configuration, upgrade, override.

3053455

Description: UFM “Set Node Description” action for unmanaged switches is not supported for Ubuntu18 deployments

Workaround: N/A

Keywords: Set Node Description, Ubuntu18

3053455

Description: UFM Installations are not supported on RHEL8.X or CentOS8.X

Workaround: N/A

Keywords: Install, RHEL8, CentOS8

3052660

Description: UFM monitoring mode is not working

Workaround: In order to make UFM work in monitoring mode, please edit telemetry configuration file: /opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini

Search for arg_12 and set empty value: arg_12=

Restarting the UFM will run the UFM in monitoring mode. Before starting the UFM make sure to set: monitoring_mode = yes in gv.cfg

Keywords: Monitoring, mode

3054340

Description: Setting non-existing log directory will fail UFM to start

Workaround: Make sure to set a valid (existing) log directory when setting this parameter (gv.cfgàlog_dir)

Keywords: Log, Dir, fail, start

-

Description: Restoring HA standby node and configuring UFM HA with external UFM-Subnet Managers are not supported on Ubuntu bare-metal deployments

Workaround: N/A

Keywords: HA standby node, bare-metal

2887364

Description: After upgrading to UFM6.8, in case UFM failed over to the secondary node, trying to get cable information for selected port will fail.

Workaround: On the secondary UFM node, copy the following files to /usr/bin/ folder:

  • /usr/flint

  • /usr/flint_ext

  • /usr/mlxcables

  • /usr/mlxcables_ext

  • /usr/mlxlink

  • /usr/mlxlink_ext

trying to get cable information on the secondary UFM node should work now.

Keywords: upgrade, failover, cable information

2784560

Description: Intentional stop for master container and start it again or reboot of master server will damage the HA failover option

Workaround: manually restart UFM cluster

Keywords: UFM Container; Reboot, Failover

2872513

Description: after rebooting master container, Failover will be triggered twice (once to the standby and then back again to the master container)

Workaround: N/A

Keywords: UFM Container, reboot, failover

2863388

Description: Fail to get cables info for NDR Split Port.

Workaround: N/A

Keywords: Cable, NDR, Split

N/A

Description: In case of using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade)

Workaround: N/A

Keywords: SM, mkey per port

2702950

Description: Internet connection is required to download and install SQLite on the old container during software the upgrade process.

Workaround: N/A

Keywords: Container; upgrade

2694977

Description: Adding a large number of devices (~1000) to a group or a logical server, on large scale setup takes ~2 minutes.

Workaround: N/A

Keywords: Add device; group; logical server; large scale

2710613

Description: Periodic topology compare will not report removed nodes if the last topology change included only removed nodes.

Workaround: N/A

Keywords: Topology comparison

2698055

Description: UFM, configured to work with telemetry for collecting historical data, is limited to work only with the configured HCA port. If this port is part of a bond interface and a failure occurs on the port, collection of telemetry data via this port stops.

Workaround: Reconfigure telemetry with the new active port and restart it within UFM.

Keywords: Telemetry; history; bond; failure

2705974

Description: If new ports are added after UFM startup, the default session REST API (GET /ufmRest/monitoring/session/0/data) will not include port statistics for the newly added ports.

Workaround: Reset the main UFM.

  • For UFM standalone – /etc/init.d/ufmd model_restart

  • For UFM HA – /etc/init.d/ufmha model_restart

Keywords: Default session; REST API; missing ports

2714738

Description: Intentional stop for master container and start it again or reboot of master server will damage the HA failover option

Workaround: manually Restart UFM cluster

Keywords: UFM Container; Reboot, Failover

2872513

Description: after rebooting master container, Failover will be triggered twice (once to the standby and then back again to the master container)

Workaround: N/A

Keywords: UFM Container, reboot, failover

2863388

Description: Fail to get cables info for NDR Splitted Port.

Workaround: N/A

Keywords: Cable, NDR, Split

N/A

Description: In case of using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade)

Workaround: N/A

Keywords: SM, mkey per port,

Description: The UFM which is configured to work with telemetry for collecting historical data, is limited to work only with the configured HCA port - if this port is part of the bond interface and failure occurs, all telemetry data via this port will be stopped.

Workaround: If a historical telemetry port is apart of the bond and a failure occurs, user should reconfigure the telemetry with a new active port and restart it within UFM.

Keywords: telemetry, history, bond, failure

Discovered in release: 6.7

2459320

Description: Docker upgrade to UFM6.6.1 from UFM6.6.0 is not supported.

Workaround: N/A

Keywords: Docker; upgrade

Discovered in release: 6.6.1

-

Description: SHARP Aggregation Manager over UCX is not supported.

Workaround: N/A

Keywords: UCX; SHARP AM

Discovered in release: 6.6.1

2288038

Description: When the user try to collect system dump for UFM Appliance host, the job will be completed with an error with the following summary: "Running as a none root user Please switch to root user (super user) and run again."

Workaround: N/A

Keywords: System dump, UFM Appliance host

Discovered in release: 6.5.2

2100564

Description: For modular dual-management switch systems, switch information is not presented correctly if the primary management module fails and the secondary takes over.

Workaround: To avoid corrupted switch information, it is recommended to manually set the virtual IP address (box IP address) for the switch as the managed switch IP address (manual IP address) within UFM.

Keywords: Modular switch, dual-management, virtual IP, box IP

Discovered in release: 6.4.1

2135272

Description: UFM does not support hosts equipped with multiple HCAs of different types (e.g. a host with ConnectX®-3 and ConnectX-4/5/6) if multi-NIC grouping is enabled (i.e. multinic_host_enabled = true).

Workaround: All managed hosts must contain HCAs of the same type (either using ConnectX-3 HCAs or use ConnectX-4/5/6 HCAs).

Keywords: Multiple HCAs

Discovered in release: 6.4.1

2063266

Description: Firmware upgrade for managed hosts with multiple HCAs is not supported. That is, it is not possible to perform FW upgrade for a specific host HCA.

Workaround: Running software (MLNX_OFED) upgrade on that host will automatically upgrade all the HCAs on this host with the firmware bundled as part of this software package.

Keywords: FW upgrade, multiple HCAs

Discovered in release: 6.4.1

-

Description: Management PKey configuration (e.g. MTU, SL) can be performed only using PKey management interface (via GUI or REST API).

Workaround: N/A

Keywords: PKey, Management PKey, REST API

Discovered in release: 6.4

2092885

Description: UFM Agent is not supported for SLES15 and RHEL8/CentOS8.

Workaround: N/A

Keywords: UFM Agent

Discovered in release: 6.4

-

Description: CentOS 8.0 does not support IPv6.

Workaround: N/A

Keywords: IPv6

Discovered in release: 6.4

1895385

Description: QoS parameters (mtu, sl and rate_limit) change does not take effect unless OpenSM is restarted.

Workaround: N/A

Keywords: QoS, PKey, OpenSM

Discovered in release: 6.3

-

Description: Logical Server Auditing feature is supported on RedHat 7.x operating systems only.

Workaround: N/A

Keywords: Logical Server, auditing, OS

Discovered in release: 5.9

-

Description: Configuration from lossy to lossless requires device reset.

Workaround: Reboot all relevant devices after changing behavior from lossy to lossless.

Keywords: Lossy configuration

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.