Changes and New Features
Feature/Change | Description |
High Availability in sharp_am Network Interfaces | sharp_am leverages multiple network interfaces of the management host to provide high availability in case of a network interface failure. For further information, please see sharp_am Network Interfaces. |
Reliable Multicast | Added support for SHARP to leverage reliable multicast option with NVIDIA Quantum-2. |
SM Data | Removed support for reading sm data by a client application. The API functions sharp_request_sm_data, sharp_get_sm_data_buf_len, and sharp_get_sm_data have been removed and can no longer be used. In addition, the configuration parameter ftree_ca_order_file is ignored in sharp_am. |
Bug Fixes | See Bug Fixes section. |
Parameter | Component | Description |
ignore_host_guids_file | sharp_am | New parameter: File with a list of Host GUIDs to be ignored for SHARP trees. Default: Null. |
ignore_sm_guids | sharp_am | New parameter: A boolean parameter, telling whether SM GUIDs need to be ignored in SHARP trees parsed from SMDB file. Default: True. |
ftree_ca_order_file | sharp_am | Deprecated parameter: This parameter is now marked as deprecated, it is ignored and should not be used. |
enable_sat | sharp_am | Deprecated parameter: This parameter controlled whether SHARP should allow SAT jobs. The parameter is now marked as deprecateI. it is ignored and should not be used. SAT is always supported. |
SHARP_COLL_SERIALIZE_MADS | libsharp | New parameter: Serialize sharp MADs in tree connect and group join operations, it is recommended to set this flag to true when running mpirun with multiple groups. Default: False. |
SHARP_COLL_JOB_REQUEST_RMC | libsharp | New parameter: If set to True, require that any allocated SHARP trees will support the Reliable Multicast feature. Default: False. |
SHARP_COLL_FORCE_BCAST_AS_ALLREDUCE | libsharp | New parameter: Force Bcast(rmc) as Allreduce operation Default: False. |