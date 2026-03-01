Bug Fixes History
The following table provides a list of bugs fixed in this SHARP version.
Internal Ref.
Issue
4068969
Description: Fixed a rare issue where a failed SHARP job request to configure a required switch would result in job initialization failure without complete cleanup. This led to repeated log messages about unsuccessful cleanup attempts.
The fix ensures that the job is properly cleaned up, even if errors occur during initialization.
Keywords: sharp_am
Discovered in Version: 3.6.0
Fixed in Release: 3.9.0
3844898
Description: Fixed the issue where sharp_am failed to allocate resources for new job requests due to scattered links and unmatched trees, despite a sufficient number of links available.
Keywords: SHARP, Report No resource
Discovered in Version: 3.5.1
Fixed in Release: 3.8.0
3438393
Description: Fixed the issue where, in the following configuration mode, resource limitations were ignored and no limits were set for any application: when using dynamic trees allocation, Quasi Fat Tree (QFT)-oriented logic, and reservation_mode is enabled.
Keywords: Dynamic trees allocation; QFT; resource limitation
Discovered in Release: 3.3.0
Fixed in Release: 3.8.0
3971970
Description: Fixed the issue where
Keywords: Syslog
Discovered in Release: 3.5.0
Fixed in Release: 3.8.0
3478803
Description: Fixed the issue where obtaining topology information (
Keywords: SHARP topology API
Discovered in Release: 3.5.0
Fixed in Release: 3.8.0
3844898
Description: Fixed the issue where sharp_am failed to allocate resources for new job requests due to scattered links and unmatched trees, despite a sufficient number of links available.
Keywords: SHARP, Report No resource
Discovered in Release: 3.5.1
Fixed in Release: 3.7.0
3696666
Description: Fixed the issue where libsharp could not communicate with sharp_am on systems that exclusively used IPv6 addresses without IPv4 addresses. Now, both libsharp and sharp_am can utilize either IPv4 or IPv6, depending on the machine configuration.
Keywords: sharp_am, libsharp, tcp/ip, smx
Discovered in Release: 3.5.1
Fixed in Release: 3.7.0
3686321
Description: When upgrading UFM from previous versions to UFM 6.15.x,
This leads to failure in saving reservation and job information, so in case of a restart of
Keywords:
Discovered in Release: 3.5.0
Fixed in Release: 3.6.0
3724093
Description: Fixed the issue where libsharp, when communicating with sharp_am via UCX, automatically selects the first available IB adapter instead of the instructed adapter for the data path.
Keywords:
Discovered in Release: 3.5.1
Fixed in Release: 3.6.0
3665349
Description: Fixed an issue where sharp_am failed to detect an abnormal termination of an application executing a SHARP job, which resulted in the failure to properly clean up its resources.
Keywords:
Discovered in Release: 3.6.0
Fixed in Release: 3.6.0
3646010
Description: Fixed an issue in sharp_am where it failed to support virtual ports when OpenSM topology policies were employed, and sharp_am was configured to utilize only one of the sub-topologies.
Keywords:
Discovered in Release: 3.6.0
Fixed in Release: 3.6.0
3609384
Description: Fixed issues concerning
Keywords:
Discovered in Release: 3.4.0
Fixed in Release: 3.5.0
3541153
Description: Fixed an issue where client application is abnormally terminated before the sharp_coll_finalize method,
Keywords:
Discovered in Release: 3.4.0
Fixed in Release: 3.5.0
3400293
Description: Fixed an issue in libsharp where it failed to respond to messages from the SM while searching for Service Records, causing the SM to print timeout messages.
Keywords: sharp_am; openSM
Discovered in Release: 3.1.0
Fixed in Release: 3.4.0
3479721
Description: Fixed the issue where sharp_am did not handle hypercube topologies well, causing it to incorrectly treat different switches as duplicates.
Keywords: sharp_am; hypercube
Discovered in Release: 3.3.0
Fixed in Release: 3.4.0
3496440
Description: Fixed the issue in sharp_am where excessive log messages were printed for each disconnected or restarted compute host. Now, the information is printed in a consolidated manner in the form of summaries of disconnected hosts or a list of those hosts in a single log message.
However, for more comprehensive details, the complete list of hosts is still available and printed at the DEBUG level.
Keywords: sharp_am
Discovered in Release: 3.3.0
Fixed in Release: 3.4.0
3336788
Description: Fixed the issue in Firmware where MAD error responses might have been received in libsharp.
Keywords: sharp_am; libsharp
Discovered in Release: 3.2.0
Fixed in Release: 3.3.0 (Quantum-2 Firmware 31.2010.6064 )
3343503
Description: Fixed the issue where sharp_am installed from MLNX_OFED used an invalid range of job IDs, resulting in occasional errors when trying to establish new SHARP jobs.
Keywords: MLNX_OFED; sharp_am
Discovered in Release: 3.2.0
Fixed in Release: 3.3.0
3368381
Description: Fixed the issue of when no sufficient amount of retries was made to resend failed libsharp GroupJoin MADs, SHARP jobs failed before they even started.
Keywords: libsharp; MADs
Discovered in Release: 3.0.0
Fixed in Release: 3.3.0
3393902
Description: Fixed the issue where re-created virtual ports were not recognized by sharp_am, thus the correct tree was not built for them. This resulted in SAT jobs getting ibv_poll_cq failure in libsharp.
Keywords: Virtual port; sharp_am; libsharp; SAT; ibv_poll_cq
Discovered in Release: 3.2.0
Fixed in Release: 3.3.0
3404474
Description: Fixed an issue where failure of application allocation of all hosts done via /app/sharp/resources REST-API returned a successful job instead of error.
Keywords: REST API; allocation
Discovered in Release: 3.2.0
Fixed in Release: 3.3.0
3406186
Description: Fixed an issue where SHARP AM failed handling reports from OpenSM if some switch ports were down or isolated.
Keywords: Aggregation Manager; Aggregation Node; OpenSM
Discovered in Release: 3.2.0
Fixed in Release: 3.3.0
3236363
Description: Fixed the way physical link failures between switches are handled. In the event of a link failure, a SHARP job utilizing the link has to be stopped; however, this will bear no effect on the other present or future jobs.
Keywords: Aggregation Manager; sharp_am; Link Failure
Discovered in Release: 3.1.0
Fixed in Release: 3.2.0
3230585
Description: Fixed the issue of when operating in Dynamic trees mode, ibdiagnet may have printed warning messages about the existence of multiple distinct trees with the same tree ID.
Keywords: Dynamic tree; ibdiagnet
Discovered in Release: 3.1.0
Fixed in Release: 3.2.0
3226743
Description: Fixed the issue of when a management host was not connected to a leaf switch, sharp_am might have printed a number of warning messages about trees that could not reach all aggregation nodes.
As of SHARP v3.2.0, the active management host is automatically identified and is not treated as a potential compute host.However, please note that this does not include standby management hosts for which a warning message would still appear. These management hosts can be mentioned in a list of GUIDs to ignore via the parameter ignore_host_guids_file.
Keywords: Aggregation Manager; sharp_am; leaf; GUID
Discovered in Release: 3.0.1
Fixed in Release: 3.2.0
3274564
Description: Fixed an issue where sharp_benchmark bash script failed to operate on all bash versions.
Keywords: sharp_benchmark
Discovered in Release: 3.1.1
Fixed in Release: 3.2.0
3262936
Description: Fixed the issue where a crash took place during sharp_am reboot while physical links were hanging between switches in the fabric.
Keywords: sharp_am; physical links; crash
Discovered in Release: 3.1.0
Fixed in Release: 3.1.1 LTS
3192770
Description: Fixed the issue where SHARP jobs failed when using virtual interfaces configured with SR-IOV.
Keywords: SR-IOV
Discovered in Release: 3.0.0
Fixed in Release: 3.1.0
3163697
Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux.
Keywords: File descriptor; libsharp; HCOLL; HPC-X
Discovered in Release: 3.0.0
Fixed in Release: 3.1.0
3192770
Description: Fixed the issue where SHARP jobs failed when using virtual interfaces configured with SR-IOV.
Keywords: SR-IOV
Discovered in Release: 3.0.0
Fixed in Release: 3.0.1
3163697
Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux.
Keywords: File descriptor; libsharp; HCOLL
Discovered in Release: 3.0.0
Fixed in Release: 3.0.1
2995739
Description: Sharp_am daemon is no longer removed when performing rpm upgrade and is overridden instead.
Keywords: Aggregation Manager; rpm
Discovered in Release: 2.6.1
Fixed in Release: 2.7.0
2972970
Description: Fixed the issue where completion of SHARP installation using sharp_daemons_setup.sh script depended on python availability.
Keywords: Aggregation Manager
Discovered in Release: 2.6.1
Fixed in Release: 2.7.0
2749073
Description: SHARP AM reports the rediscovery of aggregation nodes on every topology change.
Keywords: Aggregation Manager
Workaround: N/A
Discovered in Release: 2.5.0
2736102
Description: SHARP AM and SHARPD overrides backlog files after restart when log rotation is enabled.
Keywords: Aggregation Manager, SHARPD, log file
Workaround: N/A
Discovered in Release: 2.5.0
2700530
Description: Terminating a job process during job initialization before sending a job request to Aggregation Manager, might result in job resource leakage in the SHARP Aggregation Manager.
Workaround: N/A
Keywords: SHARPD, Aggregation Manager
Discovered in Release: 2.5.0
2726821
Description: Terminating SHARPD while the job process is still running will result in job resource leakage in SHARP Aggregation Manager.
Workaround: Terminate SHARPD after terminating the job processes.
Keywords: SHARPD, Aggregation Manager
2795902
Description: SHARPD might allocate handlers on GPU when running with UCX.
Keywords: SHARPD, SMX, UCX
Workaround: N/A
Discovered in Release: 2.5.0
Workaround: Disable UCX
2770210
Description: Syslog verbosity depends on log file verbosity.
Keywords: SHARPD, Aggregation Manager
Discovered in Release: 2.5.0
Workaround: None
2825519
Description: Aggregation Manager continue to run after SM failover.
Keywords: Aggregation Manager
Discovered in Release: 2.5.0
Workaround: Stop AM daemon manually
2754175
Description: SHARP Aggregation Manger might allocate bad links for jobs after receiving timeouts from Aggregation Nodes.
Workaround: Restart corresponding switch or restart SHARP Aggregation Manager.
Keywords: Aggregation Manager
Discovered in Release: 2.5.0
2796317
Description: SHARP jobs may hang when running in reservations mode (i.e. SHARP allocation is enabled), and reservation is created with limited PKEY, and configuring reservation PKEY on tree is enabled.
Workaround: The PKEY used for creating the reservation should be "full" (the most significant bit should be on e.g. 0x805c instead of 0x5a).
Keywords: Aggregation Manager, Reservations, PKEY, UFM
Discovered in Release: 2.5.0