NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.13.0
NVIDIA Docs Hub Homepage  NVIDIA Networking  Accelerator Software  NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.13.0  Known Issues

Known Issues

Internal Reference Number

Issues

4259313

Description: On RedHat and SLES systems, the SHARP component does not upgrade when DOCA-Host is upgraded to version 2.10 from a previous version.

Workaround: To force the SHARP component upgrade, include it explicitly in the update command:

  • For SLES: zypper up doca-ofed sharp

  • For RedHat: dnf update doca-ofed sharp

Keywords: DOCA-Host; upgrade

Discovered in Release: 3.10.3

-

Description: Using SHARP_am with switch firmware version 31.2014.3000 or later requires SHARP_am version 3.11.0 or newer.

Workaround: N/A

Keywords: SHARP_am; switch firmware

Discovered in Release: 3.9.0

3340353

Description: When reconfiguring a standby management host to operate as a compute host, it will not be able to run SHARP jobs unless sharp_am is restarted.

In case that a host runs the SM process, it will automatically be detected by the master SM as a standby SM and be reported as a standby management host.

Note that restart is not required if ignore_sm_guids is set to FALSE.

Workaround: N/A

Keywords: active; standby; compute host; ignore_sm_guids

Discovered in Release: 3.3.0

3371820

Description: Congestion Control cannot be configured on the same SLs used by sharp_am.

Workaround: N/A

Keywords: Congestion control; SL

Discovered in Release: 3.3.0

3305335

Description: When running mpirun with multiple groups, the following error message might be received:

[error] - AM QPAlloc confirm QP MAD response status 0x1c00

This message is received due to to the fact that multiple unserialized MAD requests are run in parallel.

Workaround: Set the SHARP_COLL_SERIALIZE_MADS environment variable to TRUE when running mpirun.

Keywords: mpirun; SHARP_COLL_SERIALIZE_MADS

Discovered in Release: 3.2.0

3225401

Description: Dynamic trees creation feature does not support a case in which all root switches are down and restarted. If such a scenario takes place, sharp_am should be restarted once the root switches are up and running.

Workaround: N/A

Keywords: Aggregation Manager; sharp_am; dynamic trees

Discovered in Release: 3.1.0

3237831

Description: SHARP does not support reassignment of LID values.

In case LID reassignment is desired, make sure to stop all SHARP jobs, reassign LIDs via OpenSM, and restart sharp_am once the reassignment is done.

Workaround: N/A

Keywords: Aggregation Manager; OpenSM

Discovered in Release: 3.1.0

3048427

Description: In the case that a switch split mode is modified (off/on), sharp_am does not handle the new number of supported ports unless it is restarted.

Workaround: Restart sharp_am after changing a switch split mode definition.

Keywords: Aggregation Manager; split mode

Discovered in Release: 2.7.0

3051699

Description: Changing the configuration of SHARP switch ports using device_configuration_file does not take effect on disconnected split ports. If these ports are connected later, they will remain with their default configuration.

Workaround: If the new configuration is desired for the split ports, make sure to restart the Aggregation Manager after connecting a split port to a host.

Keywords: Aggregation Manager; split port

Discovered in Release: 2.7.0

3051924

Description: Adding or replacing non-leaf switches is currently not supported by Aggregation Manager for Dragonfly+ topologies.

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Fabric extension; Aggregation Manager; AM

Discovered in Release: 2.7.0

-

Description: On multi PKEY environment, UCX in SHARP can use only the default PKEY (PKEY at index 0).

Workaround: Use sockets for communication over non-default PKEY.

Keywords: Configuration, SMX, UCX, PKEY

Discovered in Release: 2.4.3

1307124

Description: Begin Job requests with virtual ports might be rejected until fabric virtualization info file is parsed.

Workaround: Wait for AM to discover virtual ports before sending Begin Job requests.

Keywords: Aggregation Manager, Socket Direct, Virtual Ports

Discovered in Release: 1.5.3

1193629

Description: Configuring sharp_am as daemon is not possible when installing from RPM into non-default location.

Workaround: Configure daemon manually.

Keywords: Configuration

Discovered in Release: 1.5.3

1307108

Description: Discovering a new Aggregation Node (AN) found on the shortest path between two ANs might invalidate the existing path.

Workaround: Restart Aggregation Manager after the Subnet Manager completes fabric reconfiguration followed by the fabric changes.

Keywords: Aggregation Manager, Aggregation Node

Discovered in Release: 1.5.3

-

Description: High Availability for the Aggregation Manager is not supported in HPC-X/DOCA-Host packages at this time. As a result, only one instance of the Aggregation Manager can operate within the InfiniBand fabric. When there is a handover or failover of the Subnet Manager, a new instance of the Aggregation Manager should be initiated on the host where the new Master Subnet Manager is active.

Workaround: Use Aggregation Manager in UFM.

Keywords: Aggregation Manager

-

Description: Aggregation manager should run on the same Host where the Master Subnet Manager (SM) is running.

Workaround: N/A

Keywords: Aggregation Manager

-

Description: Aggregation Manager should be started after completion of fabric configuration by the Subnet Manager.

Workaround: N/A

Keywords: Aggregation Manager

-

Description: Only Fat-Tree, Quasi-Fat-Tree, Hypercube and Dragonfly+ topologies are supported by the Aggregation Manager.

Workaround: N/A

Keywords: Fabric Topology

-

Description: Only IB fabrics where all compute nodes are connected to NVIDIA SHARP capable switches are supported by the Aggregation Manager.

Workaround: Manually configure mapping between the compute port and the Aggregation Node.

Keywords: Fabric Topology

-

Description: Upon changes in configuration file beyond parameters in 3.3, Aggregation Manager should be restarted to deploy new configuration.

Workaround: N/A

Keywords: Configuration
© Copyright 2025, NVIDIA. Last updated on Nov 19, 2025
content here