Known Issues
Internal Reference Number | Issues |
3230585 | Description: When operating in Dynamic trees mode, ibdiagnet may print warning messages about the existence of multiple distinct trees with the same tree ID. In Dynamic trees mode, this is a valid situation, and these warnings should be ignored. Warning example: -W- <> - In Node <> found root tree (parent qpn <>) which is already exists for treeID: <> |
Workaround: N/A | |
Keywords: Dynamic tree; ibdiagnet | |
Discovered in Version: 3.1.0 | |
3209930 | Description: Switch firmware miscalculates the required timeout interval for acks to be returned for data messages. This can lead to false alerts about bad connections, resulting in traps being sent and jobs stopped midway. The following configuration setting in sharp_am fixes the timeout settings on the switches and is desired to be used: ib_sat_qpc_local_ack_timeout = 0×19 |
Workaround: N/A | |
Keywords: Switch firmware; ack; timeout | |
Discovered in Version: 3.1.0 | |
3225401 | Description: Dynamic trees creation feature does not support a case in which all root switches are down and restarted. If such a scenario takes place, sharp_am should be restarted once the root switches are up and running. |
Workaround: N/A | |
Keywords: Aggregation Manager; sharp_am; dynamic trees | |
Discovered in Version: 3.1.0 | |
3237831 | Description: SHARP does not support reassignment of LID values. |
Workaround: N/A | |
Keywords: Aggregation Manager; OpenSM | |
Discovered in Version: 3.1.0 | |
3226743 | Description: When the management host is not connected to a leaf switch, sharp_am might print the following warnings: The reason for these warnings is that the management host is treated as a potential compute host and it cannot be reached by all SHARP trees, unlike all other compute hosts that are connected to the leaf switches. |
Workaround: N/A | |
Keywords: Aggregation Manager; sharp_am; leaf; GUID | |
Discovered in Version: 3.1.0 | |
3236363 | Description: A physical link failure between switches while a SHARP job is running and utilizing the link can cause one of the switches to become invalid for further SHARP jobs, resulting in either "No resource" response for new SHARP job requests, or in jobs getting stuck. |
Workaround: In case of "No resource" response for a SHARP job request, you can identify whether the described scenario is the reason by looking in the sharp_am log for a message such as: [error] AN Mellanox Technologies Aggregation Node GUID:<> (LID: <>) responded with status <Not zero> to ResourceCleanup(Clean Job tree) - job_id_sharp: <>, tree_id: <> In case such an error exists in the log, restart sharp_am to clear this status and enable SHARP jobs on the invalid switch. | |
Keywords: Aggregation Manager; sharp_am; Link Failure | |
Discovered in Release: 3.1.0 | |
3048427 | Description: In the case that a switch split mode is modified (off/on), sharp_am does not handle the new number of supported ports unless it is restarted. |
Workaround: Restart sharp_am after changing a switch split mode definition. | |
Keywords: Aggregation Manager; split mode | |
Discovered in Release: 2.7.0 | |
3051699 | Description: Changing the configuration of SHARP switch ports using device_configuration_file does not take effect on disconnected split ports. If these ports are connected later, they will remain with their default configuration. |
Workaround: If the new configuration is desired for the split ports, make sure to restart the Aggregation Manager after connecting a split port to a host. | |
Keywords: Aggregation Manager; split port | |
Discovered in Release: 2.7.0 | |
3051924 | Description: Adding or replacing non-leaf switches is currently not supported by Aggregation Manager for tree topologies (Fat-Tree, Quasi-Fat-Tree) and Dragonfly+ topologies. |
Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes. | |
Keywords: Fabric extension; Aggregation Manager; AM | |
Discovered in Release: 2.7.0 | |
- | Description: On multi PKEY environment, UCX in SHARP can use only the default PKEY (PKEY at index 0). |
Workaround: Use sockets for communication over non-default PKEY. | |
Keywords: Configuration, SMX, UCX, PKEY | |
Discovered in Release: 2.4.3 | |
1307124 | Description: Begin Job requests with virtual ports might be rejected until fabric virtualization info file is parsed. |
Workaround: Wait for AM to discover virtual ports before sending Begin Job requests. | |
Keywords: Aggregation Manager, Socket Direct, Virtual Ports | |
Discovered in Release: 1.5.3 | |
1193629 | Description: Configuring sharpd/sharp_am as daemons is not possible when installing from RPM into non-default location. |
Workaround: Configure daemon manually. | |
Keywords: Configuration | |
Discovered in Release: 1.5.3 | |
1307108 | Description: Discovering a new Aggregation Node (AN) found on the shortest path between two ANs might invalidate the existing path. |
Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes. | |
Keywords: Aggregation Manager, Aggregation Node | |
Discovered in Release: 1.5.3 | |
- | Description: Adding new switches or switch replacement are currently not supported by the Aggregation Manager for Hypercube and Dragonfly+ topologies. |
Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes. | |
Keywords: Fabric extension, Aggregation Manager | |
Discovered in Release: 1.5.3 | |
- | Description: Adding new non-root switches or non-root switch replacement are currently not supported by the Aggregation Manager for tree topologies. (Fat-Tree, Quasi-Fat-Tree) |
Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes. | |
Keywords: Fabric extension, Aggregation Manager | |
- | Description: Aggregation Manager High Availability is currently not supported in HPCX/MLNX OFED packages. Therefore, only a single instance of Aggregation Manager can run in the IB fabric. |
Workaround: Use Aggregation Manager in UFM. | |
Keywords: Aggregation Manager | |
- | Description: Aggregation manager should run on the same Host where the Master Subnet Manager (SM) is running. |
Workaround: N/A | |
Keywords: Aggregation Manager | |
- | Description: In case of HPCX/MLNX OFED packages, upon Subnet Manager handover/failover, another instance of Aggregation Manager should be started on the Host where the new Master SM is running |
Workaround: Use Aggregation Manager in UFM. | |
Keywords: Aggregation Manager | |
- | Description: Aggregation Manager should be started after completion of fabric configuration by the Subnet Manager. |
Workaround: N/A | |
Keywords: Aggregation Manager | |
- | Description: Only Fat-Tree, Quasi-Fat-Tree, Hypercube and Dragonfly+ topologies are supported by the Aggregation Manager. |
Workaround: N/A | |
Keywords: Fabric Topology | |
- | Description: Only IB fabrics where all compute nodes are connected to Mellanox SHARP capable switches (Switch-IB 2) are supported by the Aggregation Manager. |
Workaround: Manually configure mapping between the compute port and the Aggregation Node. | |
Keywords: Fabric Topology | |
- | Description: Upon changes in configuration file beyond parameters in 3.3, Aggregation Manager should be restarted to deploy new configuration. |
Workaround: N/A | |
Keywords: Configuration |