NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.12.0

Changes and New Features

Feature/Change

Description

Improved Handling of MAD Errors

Enhanced SHARP_am's response to MAD errors, where instead of marking a switch as entirely unusable, it now deprioritizes the switch while keeping it eligible for job selection when alternatives are limited. Cleanup still occurs when possible, reducing disruption and improving resilience.

Bug Fixes

See Bug Fixes.

Parameter

Component

Description

dynamic_tree_allocation

sharp_am

Description: A boolean parameter, tells whether trees should be allocated dynamically for each SHARP job or have trees allocated during sharp_am initialization.

Change: This parameter is now obsolete, with dynamic allocation being the only possible mode.

© Copyright 2025, NVIDIA. Last updated on Aug 25, 2025.