NVIDIA UFM-SDN Appliance User Manual v4.14.1
v4.14.1

Known Issues History

Ref. #

Issue

N/A

Description: Execution of UFM Fabric Health Report (via UFM Web UI / REST API) will trigger ibdiagnet to use SLRG register which might cause some of the Switch and HCA's firmware to stuck and cause the HCA's ports to stay at "Init" state.

Keywords:

Discovered in Release: 6.14.0

3538640

Description: Fixed ALM plugin log rotate function

Keywords: ALM, Plugin, Log rotate

Discovered in Release: 4.12.1

3532191

Description: Fixed UFM hanging (database is locked) after corrective restart of UFM health.

Keywords: Hanging, Database, Locked

Discovered in Release: 4.12.1

3555583

Description: Resolved REST API links inability to return hostname for computer nodes

Keywords: REST API, Links, Hostname, Computer Nodes

Discovered in Release: 4.11.1

3547517

Description: Fixed UFM logs REST API returning empty result when SM logs exist on the disk

Keywords: Logs, SM logs, Empty

Discovered in Release: 4.10.0

3546178

Description: Fixed SHARP jobs failure when SHARP reservation feature is enabled

Keywords: SHARP, Jobs, Reservation

Discovered in Release: 4.12.1

3541477

Description: Fixed UFM module temperature alerting on wrong thresholds

Keywords: Module Temperature, Alert Threshold

Discovered in Release: 4.12.1

3191419

Description: Fixed UFM default session API returning port counter values as NULL

Keywords: Null, Port Counter, Value, API

Discovered in Release: 4.8.0

3560659

Description: Fixed proper update in [MngNetwork] mtu_limit in gv.cfg when restarting UFM.

Keywords: mtu_limit, gv.cfg, Update, UFM restart

Discovered in Release: 4.12.1

3496853

Description: Fixed daily report not being sent properly.

Keywords: Daily Report, Failure

Discovered in Release: 4.12.1

3469639

Description: Fixed REST RDMA server failure every couple of days, causing inability to retrieve ibdiagnet data.

Keywords: REST RDMA, ibdiagnet

Discovered in Release: 4.12.1

3455767

Description: Fixed incorrect combination of multiple devices in monitoring.

Keywords: Monitoring, Incorrect combination

Discovered in Release: 4.12.1

3511410

Description: Collect system dump for DGX host does not work due to missing sshpass utility.

Workaround: Install sshpass utility on the DGX .

Keywords: System Dump, DGX, sshpass utility

3432385

Description: UFM does not support HDR switch configured with hybrid split mode, where some of the ports are split and some are not.

Workaround:  UFM can properly operate when all or none of the HDR switch ports are configured as split.

Keywords: HDR Switch, Ports, Hybrid Split Mode

3461658

Description: After the upgrade from UFM-SDN Appliance v4.12.0 GA to UFM-SDN Appliance v4.12.1 FUR, the network fast recovery path in opensm.conf is not automatically updated and remains with a null value (fast_recovery_conf_file (null))

Workaround:  If you wish to enable the network fast recovery feature in UFM, make sure to set the appropriate path for the current fast recovery configuration file (/opt/ufm/files/conf/opensm/fast_recovery.conf) in the opensm.conf file located at conf/opensm, before starting UFM.

Keywords:  Network fast recovery, Missing, Configuration

3361160

Description: Upgrading UFM-SDN Appliance from versions 4.7.0, 4.8.0 and 4.9.0 results in cleanup of UFM-SDN Appliance historical telemetry database (due to schema change). This means that the new telemetry data will be stored based on the new schema.

Workaround: To preserve the historical telemetry database data while upgrading from UFM-SDN appliance version 4.7.0, 4.8.0 and 4.9.0, perform the upgrade in two phases. First, upgrade to UFM-SDN v4.10.0, and then upgrade to the latest UFM version (UFM-SDN appliance 4.11.0 or newer). It is important to note that the upgrade process may take longer depending on the size of the historical telemetry database.

Keywords: UFM Historical Telemetry Database, Cleanup, Upgrade

3346321

Description: In some cases, when multiport SM is configured in UFM, a failover to the secondary node might be triggered instead of failover to the local available port

Workaround: N/A

Keywords: Multiport SM, Failover, Secondary port

N/A

Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled on a previous versions of UFM - prior to UFM v4.11.0)

Workaround: Set "persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent

N/A

Description: Enabling a port for a managed switch fails in case that port is not disabled in a persistent way (this may occur in ports that were disabled on a previous versions of UFM - prior to UFM v4.11.0)

Workaround: Set "persistent_port_operation=false” in gv.cfg to use non-persistent (legacy) disabling or enabling of port. UFM restart is required.

Keywords: Disable, Enable, Port, Persistent

N/A

Description: Running UFM software with external UFM-SM is no longer supported

Workaround: N/A

Keywords: External UFM-SM

N/A

Description: If using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade).

Workaround: N/A

Keywords: SM, mkey per port

N/A

Description: If using SM mkey per port, several UFM operations might fail (get cable info, get system dump, switch FW upgrade).

Workaround: N/A

Keywords: SM, mkey per port

Discovered in release: 4.7.1

2796317

Description: SHARP jobs may hang when running in reservations mode (i.e. SHARP allocation is enabled), and reservation is created with limited PKEY

Workaround: The PKEY used for creating the reservation should be "full" (the most significant bit should be on e.g. 0x805c)

Keywords: SHARP AM, allocation, reservation, PKEY

Discovered in release: 4.6.0

-

Description: Auto-isolated high Symbol BER ports reported by UFM as unhealthy are not automatically set back as healthy when high symbol BER is elapsed.

Workaround: Manually set auto-isolated ports as healthy.

Keywords: Unhealthy ports, auto-isolated high BER ports

Discovered in release: 4.6.0

2694977

Description: Adding a large number of devices (~1000) to a group or a logical server, on large scale setup takes ~2 minutes.

Workaround: N/A

Keywords: Add device; group; logical server; large scale

Discovered in release: 4.6.0

2710613

Description: Periodic topology compare will not report removed nodes if the last topology change included only removed nodes.

Workaround: N/A

Keywords: Topology comparison

Discovered in release: 4.6.0

2698055

Description: UFM, configured to work with telemetry for collecting historical data, is limited to work only with the configured HCA port. If this port is part of a bond interface and a failure occurs on the port, collection of telemetry data via this port stops.

Workaround: Reconfigure telemetry with the new active port and restart it within UFM.

Keywords: Telemetry; history; bond; failure

Discovered in release: 4.6.0

2304264

Description: The option to collect system dump is only supported for hosts containing the CURL utility which supports the scp and sftp protocols.

Workaround: To check the protocols supported by CURL, run:

Copy
Copied!
            

curl -V

If scp and sftp are not supported, install a CURL version that supports these protocols.

Keywords: System dump, host, CURL

Discovered in release: 4.5.0

2480430

Description: Mellanox SHARP AM does not run with smx_sock_port value less than 1024 or greater than 49151.

Workaround: N/A

Keywords: Mellanox SHARP; aggregation manager

Discovered in release: 4.5.0

2288038

Description: When the user try to collect system dump for UFM Appliance host, the job will be completed with an error with the following summary: "Running as a none root user Please switch to root user (super user) and run again."

Workaround: N/A

Keywords: System dump, UFM Appliance host

Discovered in release: 4.4.0

2384211

Description: MLNX-OS version 3.9.2002 does not support SHARP allocation.

Walkaround: Downgrade the switch to MLNX-OS version 3.9.1906.

Keywords: SHARP allocation, MLNX-OS

Discovered in release: 4.4.0

2366031

Description: When upgrading a switch with MLNX-OS version 3.9.1932 and later, you must make sure to comply with the new password requirements for admin and monitor users.

  • Password must contain 8-64 characters

  • Password must be different than username

  • Password must be different than 5 previous passwords

  • Password must contain at least one of each of the following: Lowercase, uppercase and digits

Workaround: N/A

Keywords: User password, switch OS

Discovered in release: 4.4.0

2100564

Description: For modular dual-management switch systems, switch information is not presented correctly if the primary management module fails and the secondary takes over.

Workaround: To avoid corrupted switch information, it is recommended to manually set the virtual IP address (box IP address) for the switch as the managed switch IP address (manual IP address) within UFM.

Keywords: Modular switch, dual-management, virtual IP, box IP

Discovered in release: 4.3.0

2135272

Description: UFM does not support hosts equipped with multiple HCAs of different types (e.g. a host with ConnectX®-3 and ConnectX-4/5/6) if multi-NIC grouping is enabled (i.e. multinic_host_enabled = true).

Workaround: All managed hosts must contain HCAs of the same type (either using ConnectX-3 HCAs or use ConnectX-4/5/6 HCAs).

Keywords: Multiple HCAs

Discovered in release: 4.3.0

2063266

Description: Firmware upgrade for managed hosts with multiple HCAs is not supported. That is, it is not possible to perform FW upgrade for a specific host HCA.

Workaround: Running software (MLNX_OFED) upgrade on that host will automatically upgrade all the HCAs on this host with the firmware bundled as part of this software package.

Keywords: FW upgrade, multiple HCAs

Discovered in release: 4.3.0

-

Description: When upgrading from software version 4.1.x or older to 4.2.x or later, in an intermediary step, where the standby UFM appliance is upgraded and the master UFM appliance is not, some CLI commands from the master will not be operational. This happens because, in that moment, the standby appliance is in a higher SSH security level.

Workaround: After upgrading the master appliance to the latest version, CLI commands resume operations normally.

Keywords: Upgrade, high availability, SSH

Discovered in release: 4.2.0

2130688

Description: Registering an external SM system with two different IP addresses is not supported.

Workaround: Before registering an external SM system with a new IP address, it is required to unregister the old IP address for that system.

Keywords: External SM

Discovered in release: 4.2.0

1895385

Description: QoS parameters (mtu, sl and rate_limit) change does not take effect unless OpenSM is restarted.

Workaround: N/A

Keywords: QoS, PKey, OpenSM

Discovered in release: 4.2.0

-

Description: Management PKey configuration (e.g. MTU, SL) can be performed only using PKey management interface (via GUI or REST API).

Workaround: N/A

Keywords: PKey, Management PKey, REST API

Discovered in release: 4.2.0

-

Description: The hostname or/and IP address of UFM HA server is used in the HA configuration.

Workaround: Do not change hostname or IP address of UFM HA server unless you wish to reconfigure the HA mechanism.

Keywords: High availability

Discovered in release: 4.2.0

-

Description: UFM-SDN Appliance supports limited number of login sessions. When the limit is reached, any available client application (GUI, Multisite Portal or SDK) will not be able to connect to UFM during the login session timeout (default timeout is 10 minutes).

Workaround: When using SDK, do not exceed 5 logins per minute.

Keywords: UFM Server

-

Description: Configuration from lossy to lossless requires device reset.

Workaround: Reboot all relevant devices after changing behavior from lossy to lossless.

Keywords: Lossy configuration

© Copyright 2023, NVIDIA. Last updated on Dec 12, 2023.