Changes and New Features

Feature/Change

Description

High Availability in sharp_am Network Interfaces

sharp_am leverages multiple network interfaces of the management host to provide high availability in case of a network interface failure.

For further information, please see sharp_am Network Interfaces.

Reliable Multicast

Added support for SHARP to leverage reliable multicast option with NVIDIA Quantum-2.

SM Data

Removed support for reading sm data by a client application. The API functions sharp_request_sm_data, sharp_get_sm_data_buf_len, and sharp_get_sm_data have been removed and can no longer be used.

In addition, the configuration parameter ftree_ca_order_file is ignored in sharp_am.

Bug Fixes

See Bug Fixes section.

Parameter

Component

Description

ignore_host_guids_file

sharp_am

New parameter: File with a list of Host GUIDs to be ignored for SHARP trees.

Default: Null.

ignore_sm_guids

sharp_am

New parameter: A boolean parameter, telling whether SM GUIDs need to be ignored in SHARP trees parsed from SMDB file.

Default: True.

ftree_ca_order_file

sharp_am

Deprecated parameter: This parameter is now marked as deprecated, it is ignored and should not be used.

enable_sat

sharp_am

Deprecated parameter: This parameter controlled whether SHARP should allow SAT jobs.

The parameter is now marked as deprecateI. it is ignored and should not be used.

SAT is always supported.

SHARP_COLL_SERIALIZE_MADS

libsharp

New parameter: Serialize sharp MADs in tree connect and group join operations, it is recommended to set this flag to true when running mpirun with multiple groups.

Default: False.

SHARP_COLL_JOB_REQUEST_RMC

libsharp

New parameter: If set to True, require that any allocated SHARP trees will support the Reliable Multicast feature.

Default: False.

SHARP_COLL_FORCE_BCAST_AS_ALLREDUCE

libsharp

New parameter: Force Bcast(rmc) as Allreduce operation

Default: False.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.