Bug Fixes History

NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.0.0

The following table provides a list of bugs fixed in this SHARP version.

Internal Ref.

Issue

2995739

Description: Sharp_am daemon is no longer removed when performing rpm upgrade and is overridden instead.

Keywords: Aggregation Manager; rpm

Discovered in Release: 2.6.1

Fixed in Release: 2.7.0

2972970

Description: Fixed the issue where completion of SHARP installation using sharp_daemons_setup.sh script depended on python availability.

Keywords: Aggregation Manager

Discovered in Release: 2.6.1

Fixed in Release: 2.7.0

2749073

Description: SHARP AM reports the rediscovery of aggregation nodes on every topology change.

Keywords: Aggregation Manager

Workaround: N/A

Discovered in Release: 2.5.0

2736102

Description: SHARP AM and SHARPD overrides backlog files after restart when log rotation is enabled.

Keywords: Aggregation Manager, SHARPD, log file

Workaround: N/A

Discovered in Release: 2.5.0

2700530

Description: Terminating a job process during job initialization before sending a job request to Aggregation Manager, might result in job resource leakage in the SHARP Aggregation Manager.

Workaround: N/A

Keywords: SHARPD, Aggregation Manager

Discovered in Release: 2.5.0

2726821

Description: Terminating SHARPD while the job process is still running will result in job resource leakage in SHARP Aggregation Manager.

Workaround: Terminate SHARPD after terminating the job processes.

Keywords: SHARPD, Aggregation Manager

2795902

Description: SHARPD might allocate handlers on GPU when running with UCX.

Keywords: SHARPD, SMX, UCX

Workaround: N/A

Discovered in Release: 2.5.0

Workaround: Disable UCX

2770210

Description: Syslog verbosity depends on log file verbosity.

Keywords: SHARPD, Aggregation Manager

Discovered in Release: 2.5.0

Workaround: None

2825519

Description: Aggregation Manager continue to run after SM failover.

Keywords: Aggregation Manager

Discovered in Release: 2.5.0

Workaround: Stop AM daemon manually

2754175

Description: SHARP Aggregation Manger might allocate bad links for jobs after receiving timeouts from Aggregation Nodes.

Workaround: Restart corresponding switch or restart SHARP Aggregation Manager.

Keywords: Aggregation Manager

Discovered in Release: 2.5.0

2796317

Description: SHARP jobs may hang when running in reservations mode (i.e. SHARP allocation is enabled), and reservation is created with limited PKEY, and configuring reservation PKEY on tree is enabled.

Workaround: The PKEY used for creating the reservation should be "full" (the most significant bit should be on e.g. 0x805c instead of 0x5a).

Keywords: Aggregation Manager, Reservations, PKEY, UFM

Discovered in Release: 2.5.0

© Copyright 2023, NVIDIA. Last updated on Feb 15, 2024.