NVIDIA UFM Enterprise User Manual v6.18.0
NVIDIA UFM Enterprise User Manual v6.18.0

Known Issues History

Ref #

Issue

Rev 6.17.0

3859362

Description: UFM TFS endpoint dashboard report Switch port TX/RX rate reach Tbps

Keywords: TFS, Switch Port, TX/RX

Workaround: N/A

Discovered in Release: v6.15.1

3881365

Description: Malfunctioning of the rest API when deleting port associated to a pkey

Keywords: CloudX, API, Bare-Metal

Workaround: N/A

Discovered in Release: v6.15.2

3862847

Description: UFM reports wrong cable length for NDR optical cables connected to Quantum-2 NDR switch

Keywords: NDR, Optical Cables, Quantum-2, Switch

Workaround: N/A

Discovered in Release: v6.17.0

Rev 6.16.0

3791820

Description: Configuring the collection of SLVL on the secondary telemetry will result in SLVL data being sampled at a reduced rate.

Keywords: SLVL, Multi-Rate, Reduced Rate

Workaround: Edit the launch_ibdiagnet_config.ini file and restart the UFM telemetry.

  1. Edit the launch_ibdiagnet_config.ini file by running the following command:

    Copy
    Copied!
                

    vi /opt/ufm/files/conf/secondary_telemetry_defaults/launch_ibdiagnet_config.ini

    Comment the following line:

    Copy
    Copied!
                

    #base_freq=1

  2. Restart UFM telemetry:

    Copy
    Copied!
                

    /etc/init.d/ufmd ufm_telemetry_stop /etc/init.d/ufmd ufm_telemetry_start

Discovered in Release: 6.15.0

3775405

Description: Upon UFM startup, an empty temporary folder will be created at /tmp folder every 10 minutes (due to periodic telemetry status check)

Keywords: Empty folder, temporary, /tmp

Workaround: Add 'rm -f /tmp/tmp*' to crontab to run daily or change instances_sessions_compatibility_interval parameter in gv.cfg to 30/60 minutes

Discovered in Release: v6.15.0

3560659

Description: Modifying the mtu_limit parameter for [MngNetwork] in gv.cfg does not accurately reflect changes upon restarting UFM.

Keywords: mtu_limit, MngNetwork, gv.cfg, UFM restart

Workaround: UFM needs to be restarted twice in order for the changes to take effect.

Discovered in Release: v6.15.0

3729822

Description: The Logs API temporarily returns an empty response when SM log file contains messages from both previous year (2023) and current year (2024).

Keywords: Logs API, Empty response, Logs file

Workaround: N/A (issue will be automatically resolved after the problematic SM log file, which include messages from 2023 and 2024 years, will be rotated)

Discovered in Release: v6.15.0

3675071

Description: UFM stops gracefully after the b2b primary cable is physically disconnected

Keywords: UFM HA, B2B, Primary Cable Disconnection

Workaround: N/A

Discovered in Release: 6.14.1

N/A

Description: Execution of UFM Fabric Health Report (via UFM Web UI / REST API) will trigger ibdiagnet to use SLRG register which might cause some of the switch and HCA's firmware to stuck and cause the HCA's ports to stay at "Init" state.

Keywords: Fabric Health Report, SLRG register, "Init" state, Switch, HCA

Discovered in Release: 6.14.0

3538640

Description: Fixed ALM plugin log rotate function.

Keywords: ALM, Plugin, Log rotate

Discovered in Release: 6.13.0

3532191

Description: Fixed UFM hanging (database is locked) after corrective restart of UFM health.

Keywords: Hanging, Database, Locked

Discovered in Release: 6.13.0

3555583

Description: Resolved REST API links' inability to return hostname for computer nodes.

Keywords: REST API, Links, Hostname, Computer Nodes

Discovered in Release: 6.12.1

3549795

Description: Fixed ufm_ha_cluster status to show DRBD sync status.

Keywords: ufm_ha_cluster, DRBD, Sync Status

Discovered in Release: 6.13.0

3549793

Description: Fixed UFM HA installation failure.

Keywords: HA, Installation

Discovered in Release: 6.13.0

3547517

Description: Fixed UFM logs REST API returning empty result when SM logs exist on the disk.

Keywords: Logs, SM logs, Empty

Discovered in Release: 6.11.0

3546178

Description: Fixed SHARP jobs failure when SHARP reservation feature is enabled.

Keywords: SHARP, Jobs, Reservation

Discovered in Release: 6.13.0

3541477

Description: Fixed UFM module temperature alerting on wrong thresholds.

Keywords: Module Temperature, Alert Threshold

Discovered in Release: 6.13.0

3191419

Description: Fixed UFM default session API returning port counter values as NULL.

Keywords: Null, Port Counter, Value, API

Discovered in Release: 6.9.0

3560659

Description: Fixed proper update in [MngNetwork] mtu_limit in gv.cfg when restarting UFM.

Keywords: mtu_limit, gv.cfg, Update, UFM restart

Discovered in Release: 6.13.1

3534374

Description: Fixed configure_ha_nodes.sh failure when deploying UFM6.13.x HA on Ubuntu22.04.

Keywords: configure_ha_nodes.sh, HA, Ubuntu22.04

Discovered in Release: 6.13.0

3496853

Description: Fixed daily report not being sent properly.

Keywords: Daily Report, Failure

Discovered in Release: 6.13.0

3469639

Description: Fixed REST RDMA server failure every couple of days, causing inability to retrieve ibdiagnet data.

Keywords: REST RDMA, ibdiagnet

Discovered in Release: 6.12.0

3455767

Description: Fixed incorrect combination of multiple devices in monitoring.

Keywords: Monitoring, Incorrect combination

Discovered in Release: 6.12.0

3511410

Description: Collect system dump for DGX host does not work due to missing sshpass utility.

Workaround: Install sshpass utility on the DGX .

Keywords: System Dump, DGX, sshpass utility

3432385

Description: UFM does not support HDR switch configured with hybrid split mode, where some of the ports are split and some are not.

Workaround:  UFM can properly operate when all or none of the HDR switch ports are configured as split.

Keywords: HDR Switch, Ports, Hybrid Split Mode

3472330

Description: On bare-metal high availability (HA), when initiating a UFM system dump from either the master or standby node, the collection process will not include the HA dumps (pacemaker and DRBD).

Workaround:  To extract the HA system dump from bare-metal, run the following command from the master/standby nodes:

Copy
Copied!
            

/usr/bin/vsysinfo -S all -e -f /etc/ufm/ufm-ha-sysdump.conf -O /tmp/HA_sysdump

The extracted HA system dump are stored in /tmp/HA_sysdump.gz.tar

Keywords: UFM System Dump, HA, Bare-Metal

3461658

Description: After the upgrade from UFM Enterprise v6.13.0 GA to UFM Enterprise v6.13.1 FUR, the network fast recovery path in opensm.conf is not automatically updated and remains with a null value (fast_recovery_conf_file (null))

Workaround:  If you wish to enable the network fast recovery feature in UFM, make sure to set the appropriate path for the current fast recovery configuration file (/opt/ufm/files/conf/opensm/fast_recovery.conf) in the opensm.conf file located at /opt/ufm/files/conf/opensm, before starting UFM.

Keywords:  Network fast recovery, Missing, Configuration

N/A

Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled on previous versions of UFM - prior to UFM v6.12.0)

Workaround: Set "persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of the port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent

3346321

Description:  Failover to another port (multi-port SM) will not work as expected in case UFM was deployed as a docker container

Workaround: Failover to another port (multi-port SM) works properly on UFM Bare-metal deployments

Keywords: Failover to another port, Multi-port SM

3348587

Description: Replacement of defected nodes in the HA cluster does not work when PCS version is 0.9.x

Workaround: N/A

Keywords: Defected Node, HA Cluster, pcs version

3336769

Description: UFM-HA: In case the back-to-back interface is disabled or disconnected, the HA cluster will enter a split-brain state, and the "ufm_ha_cluster status" command will stop functioning properly.

Workaround: To resolve the issue:

  1. Connect or enable the back-to-back interface

  2. Run

    Copy
    Copied!
                

    pcs cluster start --all

  3. Follow instructions in Split-Brain Recovery in HA Installation.

Keywords: HA, Back-to-back Interface

3361160

Description: Upgrading UFM Enterprise from versions 6.8.0, 6.9.0 and 6.10.0 results in cleanup of UFM historical telemetry database (due to schema change). This means that the new telemetry data will be stored based on the new schema.

Workaround: To preserve the historical telemetry database data while upgrading from UFM version 6.8.0, 6.9.0 and 6.10.0, perform the upgrade in two phases. First, upgrade to UFM v6.11.0, and then upgrade to the latest UFM version (UFM v6.12.0 or newer). It is important to note that the upgrade process may take longer depending on the size of the historical telemetry database.

Keywords: UFM Historical Telemetry Database, Cleanup, Upgrade

3346321

Description: In some cases, when multiport SM is configured in UFM, a failover to the secondary node might be triggered instead of failover to the local available port

Workaround: N/A

Keywords: Multiport SM, Failover, Secondary port

3240664

Description: This software release does not support upgrading the UFM Enterprise version from the latest GA version (v6.11.0). UFM upgrade is supported in UFM Enterprise v6.9.0 and v6.10.0.

Workaround: N/A

Keywords: UFM Upgrade

3242332

Description: Upgrading MLNX_OFED uninstalls UFM

Workaround: Upgrade UFM to a newer version (v6.11.0 or newer), then upgrade MLNX_OFED

Keywords: MLNX_OFED, Uninstall, UFM

3237353

Description: Upgrading from UFM v6.10 removes MLNX_OFED crucial packages

Workaround: Reinstall MLNX_OFED/UFM

Keywords: MLNX_OFED, Upgrade, Packages

N/A

Description: Running UFM software with external UFM-SM is no longer supported

Workaround: N/A

Keywords: External UFM-SM

3144732

Description: By default, a managed Ubuntu 22 host will not be able to send system dump (sysdump) to a remote host as it does not include the sshpass utility.

Workaround: In order to allow the UFM to generate system dump from a managed Ubuntu 22 host, install the sshpass utility prior to system dump generation.

Keywords: Ubuntu 22, sysdump, sshpass

3129490

Description: HA uninstall procedure might get stuck on Ubuntu 20.04 due to multipath daemon running on the host.

Workaround: Stop the multipath daemon before running the HA uninstall script on Ubuntu 20.04.

Keywords: HA uninstall, multipath daemon, Ubuntu 20.04

3147196

Description: Running the upgrade procedure on bare metal Ubuntu 18.04 in HA mode might fail.

Workaround: For instructions on how to apply the upgrade for bare metal Ubuntu 18.04, refer to High Availability Upgrade for Ubuntu 18.04 .

Keywords: Upgrade, Ubuntu 18.04, Docker Container, failure

3145058

Description: Running upgrade procedure on UFM Docker Container in HA mode might fail.

Workaround: For instructions on how to apply the upgrade for UFM Docker Container in HA, refer to Upgrade Container Procedure.

Keywords: Upgrade, Docker Container, failure

3061449

Description: Upon upgrade of UFM all telemetry configurations will be overridden with the new telemetry configuration of the new UFM version.

Workaround: If the telemetry configuration is set manually, the user should set up the configuration after upgrading the UFM for the changes to take effect.

Telemetry manual configuration should be set on the following telemetry configuration file right after UFM upgrade: /opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini.

Keywords: Telemetry, configuration, upgrade, override.

3053455

Description: UFM “Set Node Description” action for unmanaged switches is not supported for Ubuntu18 deployments

Workaround: N/A

Keywords: Set Node Description, Ubuntu18

3053455

Description: UFM Installations are not supported on RHEL8.X or CentOS8.X

Workaround: N/A

Keywords: Install, RHEL8, CentOS8

3052660

Description: UFM monitoring mode is not working

Workaround: In order to make UFM work in monitoring mode, please edit telemetry configuration file: /opt/ufm/conf/telemetry_defaults/launch_ibdiagnet_config.ini

Search for arg_12 and set empty value: arg_12=

Restarting the UFM will run the UFM in monitoring mode. Before starting the UFM make sure to set: monitoring_mode = yes in gv.cfg

Keywords: Monitoring, mode

3054340

Description: Setting non-existing log directory will fail UFM to start

Workaround: Make sure to set a valid (existing) log directory when setting this parameter (gv.cfgàlog_dir)

Keywords: Log, Dir, fail, start

© Copyright 2024, NVIDIA. Last updated on Aug 27, 2024.