NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.0.0
1.0

Known Issues

Internal Reference Number

Issues

3048427

Description: In the case that a switch split mode is modified (off/on), sharp_am does not handle the new number of supported ports unless it is restarted.

Workaround: Restart sharp_am after changing a switch split mode definition.

Keywords: Aggregation Manager; split mode

Discovered in Release: 2.7.0

3051699

Description: Changing the configuration of SHARP switch ports using device_configuration_file does not take effect on disconnected split ports. If these ports are connected later, they will remain with their default configuration.

Workaround: If the new configuration is desired for the split ports, make sure to restart the Aggregation Manager after connecting a split port to a host.

Keywords: Aggregation Manager; split port

Discovered in Release: 2.7.0

3051924

Description: Adding or replacing non-leaf switches is currently not supported by Aggregation Manager for tree topologies (Fat-Tree, Quasi-Fat-Tree) and Dragonfly+ topologies.

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Fabric extension; Aggregation Manager; AM

Discovered in Release: 2.7.0

-

Description: On multi PKEY environment, UCX in SHARP can use only the default PKEY (PKEY at index 0).

Workaround: Use sockets for communication over non-default PKEY.

Keywords: Configuration, SMX, UCX, PKEY

Discovered in Release: 2.4.3

1307124

Description: Begin Job requests with virtual ports might be rejected until fabric virtualization info file is parsed.

Workaround: Wait for AM to discover virtual ports before sending Begin Job requests.

Keywords: Aggregation Manager, Socket Direct, Virtual Ports

Discovered in Release: 1.5.3

1193629

Description: Configuring sharpd/sharp_am as daemons is not possible when installing from RPM into non-default location.

Workaround: Configure daemon manually.

Keywords: Configuration

Discovered in Release: 1.5.3

1307108

Description: Discovering a new Aggregation Node (AN) found on the shortest path between two ANs might invalidate the existing path.

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Aggregation Manager, Aggregation Node

Discovered in Release: 1.5.3

-

Description: Adding new switches or switch replacement are currently not supported by the Aggregation Manager for Hypercube and Dragonfly+ topologies.

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Fabric extension, Aggregation Manager

Discovered in Release: 1.5.3

-

Description: Adding new non-root switches or non-root switch replacement are currently not supported by the Aggregation Manager for tree topologies. (Fat-Tree, Quasi-Fat-Tree)

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Fabric extension, Aggregation Manager

-

Description: Aggregation Manager High Availability is currently not supported in HPCX/MLNX OFED packages. Therefore, only a single instance of Aggregation Manager can run in the IB fabric.

Workaround: Use Aggregation Manager in UFM.

Keywords: Aggregation Manager

-

Description: Aggregation manager should run on the same Host where the Master Subnet Manager (SM) is running.

Workaround: N/A

Keywords: Aggregation Manager

-

Description: In case of HPCX/MLNX OFED packages, upon Subnet Manager handover/failover, another instance of Aggregation Manager should be started on the Host where the new Master SM is running

Workaround: Use Aggregation Manager in UFM.

Keywords: Aggregation Manager

-

Description: Aggregation Manager should be started after completion of fabric configuration by the Subnet Manager.

Workaround: N/A

Keywords: Aggregation Manager

-

Description: Only Fat-Tree, Quasi-Fat-Tree, Hypercube and Dragonfly+ topologies are supported by the Aggregation Manager.

Workaround: N/A

Keywords: Fabric Topology

-

Description: Only IB fabrics where all compute nodes are connected to Mellanox SHARP capable switches (Switch-IB 2) are supported by the Aggregation Manager.

Workaround: Manually configure mapping between the compute port and the Aggregation Node.

Keywords: Fabric Topology

-

Description: Upon changes in configuration file beyond parameters in 3.3, Aggregation Manager should be restarted to deploy new configuration.

Workaround: N/A

Keywords: Configuration

© Copyright 2023, NVIDIA. Last updated on Feb 15, 2024.