NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.7.0
1.0

Release Notes Change History

Feature/Change

Description

Rev 3.6.0

Parameter Changes

smx_keepalive_refresh_interval

smx_keepalive_min_time_before_connection_refresh

smx_keepalive_min_percentage_of_connections_to_refresh_at_iteration

Bug Fixes

See Bug Fixes section.

Rev 3.5.1

New SHARP capability

Added configuration parameters that control the desired behavior with regard to reservation scale-in and override of one reservation by another.

Parameter Changes

load_reservation_files

reservation_force_guid_assignment

reservation_stop_jobs_upon_scale_in

Bug Fixes

See Bug Fixes section.

Rev 3.5.0

Parameter Changes

Added support for controlling the sharp_am log messages verbosity level per the desired log category.

Bug Fixes

See Bug Fixes section.

Rev 3.4.0

Parameter Changes

dynamic_tree_allocation

sharp_am

A boolean parameter, indicates whether trees should be allocated dynamically for each SHARP job or have trees allocated during sharp_am initialization.

Update: Default value is now True

Bug Fixes

See Bug Fixes section.

Rev 3.3.0

Syslog Capabilities

Added support for a new syslog capability to libsharp.

Syslog verbosity level can now be controlled using the SHARP_SYSLOG_VERBOSITY environment variable.

Dynamic Trees Allocation Algorithms

Added support for selecting one of two algorithms that determine how trees should be created for each SHARP job. One algorithm is optimized for SuperPOD fabrics, while the other is optimized for Quasi Fat Trees (QFTs).

For further information, please see Dynamic Trees Allocation Algorithms section.

REST API Jobs Query

Added support for retrieving the status of the current active SHARP jobs along with the structure of the trees assigned to them.

Note that this information is retrieved via REST-API and requires the use of UFM.

Unhealthy Ports

Added support in OpenSM to inform SHARP of dangling or unhealthy links in order to avoid their use in SHARP jobs.

Bug Fixes

See Bug Fixes section.

Rev 3.2.0

High Availability in sharp_am Network Interfaces

sharp_am leverages multiple network interfaces of the management host to provide high availability in case of a network interface failure.

For further information, please see sharp_am Network Interfaces.

Reliable Multicast

Added support for SHARP to leverage reliable multicast option with NVIDIA Quantum-2.

SM Data

Removed support for reading sm data by a client application. The API functions sharp_request_sm_data, sharp_get_sm_data_buf_len, and sharp_get_sm_data have been removed and can no longer be used.

In addition, the configuration parameter ftree_ca_order_file is ignored in sharp_am.

Bug Fixes

See Bug Fixes section.

Rev 3.1.1 LTS

SHARP Cleanup

Added the ability to clean up all SHARP-related definitions either to spare resources or to contribute to the recovery from an error.

General

Updated MLNX_OFED and firmware versions in General Information section.

Bug Fixes

See Bug Fixes section.

Rev 3.1.0

Aggregation Manager (AM)

Added support for dynamic creation of trees instead of static allocation when SHARP is initialized.

Rev 3.0.1

Bug Fixes

See Bug Fixes section.

Rev 3.0.0

General

Added support for executing multiple jobs that aggregate data through the same set of switches, while each job utilizes a different set of links.

SHARP logic is now application-aware with UFM capabilities. SHARP jobs can be assigned an App-ID, which can be used as a reference to the customer application performing these jobs.

For further information, please refer to UFM SLURM Integration Appendix in UFM UM.

Added the option to limit the SHARP resources that applications are allowed to consume.

For further information, please refer to UFM SLURM Integration Appendix in UFM UM.

AM

Modified the default resources provided to LLT & SAT jobs. This enables operation of a larger amount of SAT jobs in parallel to few LLT jobs (please see the first three entries in the table below).

libsharp

SHARP jobs are now executed in exclusive lock mode by default (please see SHARP_COLL_JOB_REQ_EXCLUSIVE_LOCK_MODE in the table below).

Rev 2.7.0

Switches

Added support for NVIDIA Quantum-2 switches with NDR speed

Adapter Cards

Added support for NVIDIA ConnectX-7 adapter card with 400 Gb/s speed

SHARPD

sharpd daemon process has been removed. sharpd-related activity is now performed from the user application process

AM

Upon restart of AM, it no longer needs to wait for all concurrent jobs to finish before being able to accept new jobs

Added a mechanism that periodically checks for errors in Aggregation Trees and attempts to fix them

General

Added support for new data types BFLOAT16, INT8 and UNIT8 for performing reduction operations

Rev 2.6.1

General

Added support for running libsharp_coll from SHARP 2.6.1 with SHARPD from SHARP 2.4.0 – 2.6.1

General

Added information about updatable configuration parameters in the configuration file and help menu

Network

Added support for keep-alive on connections to SHARPD

Network

Added support for asynchronous connections

Network

Disabled UCX listener as default in SHARP Aggregation Manager

AM

Added support for the non-default subnet prefix

AM

Added support for DF+ topologies with more than two-level islands

SHARPD

Added support for caching AM address

Rev 2.5.0

Resource Management

Added support for exclusive lock requests for streaming aggregation jobs.

Network

Enabled connection keep-alive between SHARPD and Aggregation Manager.

Rev 2.4.3

General

Added support for identifying Aggregation Nodes based on SMDB.

General

Improved minhop tables calculation.

General

Added a new API for querying events.

Rev 2.1.4

sharp_am/sharpd/libsharp_coll: Streaming Aggregation

Added support for Streaming Aggregation over ConnectX-6 adapter card and Quantum switch.

libsharp_coll: GPU Accelerator

Added support for NVIDIA GPU buffers.

sharp_am: OOB

Added support for identifying the topology type from the OpenSM SMDB file.

sharp_am: Reboot

Fixed an issue where recovery failed after reboot of all switches in the cluster.

Rev 2.0.0

sharp_am/sharpd/libsharp_coll

Added support for the following NVIDIA Quantum switch capabilities:

  • Performing data operations on new data types (unsigned short, short, and short floating point data types)

  • 1K OST payload

sharp_am/sharpd: Resource Management

Added support for enabling and disabling reproducibility on the job level.

sharp_am/sharpd: Subnet Management

Added support for controlling the SA key for SA operations.

libsharp_coll: GPUDirect

Added support for CUDA GPUDirect and GPUDirect RDMA.

Rev 1.8.1

Aggregation Manager (sharp_am): Resiliency

Added support for waiting for jobs to end prior to performing fabric reinitialization on AM startup.

Mellanox SHARP Daemon (sharpd): Out-of-Box Improvements

Socket-based is now activated by default when installed from RPM/MLNX_OFED.

Parameter

Component

Description

3.5.0

log_categories_file

Sharp_am

Added support for a new string parameter which enables indicating the log categories file path.

The value "(NULL)" indicates that the log categories file does not exist.

Default: In UFM, the default path is: /opt/ufm/files/conf/fabric_log_categories.cfg

Rev 3.3.0

dynamic_tree_algorithm

sharp_am

New parameter: Sets which algorithm should be used by the dynamic tree mechanism.

This parameter is ignored when dynamic_tree_allocation is false.

Possible values:

0 - SuperPOD oriented algorithm

1 - Quasi Fat Tree oriented algorithm

Default: 0 – SuperPOD oriented algorithm

app_resources_default_limit

sharp_am

Sets the default max number of trees allowed to be used in parallel by a single app.

Modified the possible range of values where the value of –1 means no resource limit, and 0 means no resources by default.

Default: -1 – No resource limit

max_quota

sharp_am

Deprecated parameter: This parameter is now marked as deprecated. It is ignored and should not be used.

default_quota

sharp_am

Deprecated parameter: This parameter is now marked as deprecated. It is ignored and should not be used.

SHARP_SYSLOG_VERBOSITY

libsharp

New parameter: Sets the libsharp syslog verbosity level. Possible values:

0 – Disable syslog

1 – Errors log level

2 – Warnings log level

3 – Info log level

Default: 1 – Errors log level

SHARP_GROUP_JOIN_MAD_TIMEOUT

libsharp

Sets the timeout till a retry for GroupJoin MAD, in milliseconds.

Modified the default value.

Default: 3000 milliseconds

SHARP_GROUP_JOIN_MAD_RETRIES

libsharp

Sets the number of retries for GroupJoin MAD.

Modified the default value.

Default: 5 retries

SHARP_QP_CONFIRM_MAD_TIMEOUT

libsharp

Sets the timeout till a retry for QP Allocation confirmation MAD, in milliseconds.

Modified the default value.

Default: 2000 milliseconds

Rev 3.2.0

ignore_host_guids_file

sharp_am

New parameter: File with a list of Host GUIDs to be ignored for SHARP trees.

Default: Null.

ignore_sm_guids

sharp_am

New parameter: A boolean parameter, telling whether SM GUIDs need to be ignored in SHARP trees parsed from SMDB file.

Default: True.

ftree_ca_order_file

sharp_am

Deprecated parameter: This parameter is now marked as deprecated, it is ignored and should not be used.

enable_sat

sharp_am

Deprecated parameter: This parameter controlled whether SHARP should allow SAT jobs.

The parameter is now marked as deprecateI. it is ignored and should not be used.

SAT is always supported.

SHARP_COLL_SERIALIZE_MADS

libsharp

New parameter: Serialize sharp MADs in tree connect and group join operations, it is recommended to set this flag to true when running mpirun with multiple groups.

Default: False.

SHARP_COLL_JOB_REQUEST_RMC

libsharp

New parameter: If set to True, require that any allocated SHARP trees will support the Reliable Multicast feature.

Default: False.

SHARP_COLL_FORCE_BCAST_AS_ALLREDUCE

libsharp

New parameter: Force Bcast(rmc) as Allreduce operation

Default: False.

Rev 3.1.1 LTS

clean_and_exit

sharp_am

New parameter: A boolean parameter. When set to TRUE, sharp_am does not operate normally, but instead cleans SHARP resources from all switches and exits.

Default: False - Operate normally.

Rev 3.1.0

dynamic_tree_allocation

sharp_am

New parameter: A boolean parameter, tells whether trees should be allocated dynamically for each SHARP job or have trees allocated during sharp_am initialization.

Default: False

max_trees_to_build

sharp_am

Update: In case dynamic_tree_allocation is set to True, this parameter will have no effect on the number of trees allocated; sharp_am would determine that value based on the amount of possible trees the switches can have. However, in the dynamic trees mode, this parameter affects the number of skeleton trees that sharp_am will use. It is recommended that the minimal value be the same as the number of root switches in the fabric.

In case dynamic_tree_allocation is set to False, this parameter can be used to fulfil its purpose.

Default:

SHARP_COLL_IB_TIMEOUT

libsharp

New parameter: Transport timeout on SHARP QP

Default: 18

SHARP_COLL_IB_RETRY_COUNT

libsharp

New parameter: Transport retries on SHARP QP

Default: 7

SHARP_COLL_IB_RNR_TIMER

libsharp

New parameter: RNR timeout on SHARP QP

Default: 12

SHARP_COLL_IB_RNR_RETRY

libsharp

New parameter: RNR retries on SHARP QP

Default: 7

SHARP_COLL_IB_SL

libsharp

New parameter: SL

Default: 0

SHARP_COLL_ENABLE_MCAST_TARGET

libsharp

Update: Modified the default value from True to False.

Default: False

Rev 3.0.0

per_prio_default_quota

sharp_am

Update: This parameter controls only the default percentage provided to LLT jobs. Its default value is modified from 3 to 20

per_prio_default_sat_quota

sharp_am

New parameter: Default percentage of quota (OSTs, Buffers and Groups) per aggregation node per tree, to be requested for a single SAT job by its priority.

If no explicit quota request is submitted, this parameter will set the quota percentage to be used.

Format: prio_0_quota, [prio_1_quota, ..., prio_9_quota]

Note that if only one value is set, it will be applied to all priorities.

Default: 3

sat_jobs_default_absolute_osts

sharp_am

New parameter: Default number of OSTs to be allocated for SAT jobs per aggregation node per tree.

Zero value means that no absolute value should be used, and the default percentage value is used instead.

Note that the number of OSTs also affects the number of groups.

Default: 0

app_resources_default_limit

sharp_am

New parameter: A numerical parameter, applicable only when reservation_mode is set to true. Sets the default max number of trees allowed to be used in parallel by a single app. This default value can be overridden per app upon reservation request.

A value of 0 means no allowed resources, which means an app cannot execute any sharp job.

Default: 1

force_app_id_match

sharp_am

New parameter: A boolean parameter, applicable only when reservation_mode is set to true. When set to true, an application ID must be provided upon job request, and it must match the application ID provided upon reservation request. Otherwise, the job will be denied.

Default: False

SHARP_COLL_JOB_REQ_EXCLUSIVE_LOCK_MODE

libsharp

Update: Changed default value from 0 (no exclusive lock) to 2 (force exclusive lock)

Rev 2.7.0

recovery_retry_interval

sharp_am

New parameter: A timeout in seconds for trees recovery retries. A value of 0 means do not try to recover trees.

Default: 300

enable_seamless_restart

sharp_am

New parameter: A boolean flag. If enabled, AM tries to recover state from last AM run and continue the operation of the current jobs.

Default: True

seamless_restart_trees_file

sharp_am

New parameter: Set the SHARP trees file used in Seamless restart. Need to mention only the file name, full path is constructed using ‘dump_dir’.

Default: sharp_am_trees_structure.dump

seamless_restart_max_retries

sharp_am

New parameter: Set the number of consecutive retries of seamless restart. If seamless restart fails more times in a row, it will be disabled in the next run.

Default: 3

max_tree_radix

sharp_am

Update: Change default to 252

Ib_sat_max_mtu

sharp_am

Update: Change default to 5, to support MAD value that represents 4K MTU.

per_prio_default_quota

sharp_am

Update: Changed default to 3 instead of 20, enabling more SAT jobs to take place in parallel on each switch.

Rev 2.6.1

dump_dir

sharp_am

Update: Changed default to /var/log

smx_enabled_protocols

sharp_am

Update: Changed default from 7 to 6 (disable UCX by default)

ib_mad_timeout

sharp_am

Update: Change deault from 200 to 500

dump_dir

sharp_am

Update: Change default to /var/log

sr_mad_timeout

sharpd

New parameter: Control timeout for ServiceRecord queries

Default: 10000 millieconds

sr_mad_retries

sharpd

New parameter: Control number of retries for ServiceRecord queries

Default: 3 retires

Rev 2.5.0

smx_keepalive_interval

sharp_am/sharpd

New parameter: Keep alive interval in seconds 0 to disable keep alive.Default: 60 seconds

smx_incoming_conn_keepalive_interval

sharp_am

New parameter: Keep alive interval for incoming connections 0 to disable

Default: 300 seconds

enable_exclusive_lock

sharp_am

New parameter: Enable/Disable exclusive lock feature.

Default: True

enable_topology_api

sharp_am

New parameter: Enable/Disable Toplogy API feature

Default: True

max_trees_to_build

sharp_am

New parameter: Control number of trees for AM to build

Default: 126

Rev 2.4.3

ib_max_mads_on_wire

sharp_am

Modified behavior: Changed default from 100 to 4096

ib_qpc_local_ack_timeout

sharp_am

Modified behavior: Changed default from 0x1F to 0x12

ib_sat_qpc_local_ack_timeout

sharp_am

Modified behavior: Changed default from 0x1F to 0x12

ib_qpc_timeout_retry_limit

sharp_am

Modified behavior: Changed default from 7 to 6

ib_sat_qpc_timeout_retry_limit

sharp_am

Modified behavior: Changed default from 7 to 6

Rev 2.0.0

control_path_version

sharp_am

New parameter

Default

max_compute_ports_per_agg_node

sharp_am

Modified behavior: When set to 0, AN radix is set to maximal radix value.

Default: 0

default_reproducibility

sharp_am

New parameter: Control default reproducibility mode for jobs.

Default: TURE

ib_sa_key

sharp_am

New parameter: Control SA key for SA operations.

Default: 0x1

coll_job_quota_max_payload_per_ost

sharp_job_quota

Modified behavior: Change default value to 1024.

SHARP_COLL_MAX_PAYLOAD_SIZE

Libsharp_coll

Removed

SHARP_COLL_NUM_SHARP_COLL_REQ

Libsharp_coll

Removed

SHARP_COLL_ENABLE_REPRODUCIBLE_MODE

Libsharp_coll

New parameter: Control job reproducibility mode:

0 – Use default.

1 – No reproducibility.

2 – Reproducibility.

SHARP_COLL_ENABLE_CUDA

Libsharp_coll

New parameter: Enables CUDA GPU direct.

SHARP_COLL_ENABLE_GPU_DIRECT_RDMA

Libsharp_coll

New parameter: Enables GPU direct RDMA.

Rev 1.8.1

pending_mode_timeout

sharp_am

New parameter: Defines AM waiting time for jobs to complete prior to fabric re-initialization upon startup.

job_info_polling_interval

sharp_am

New parameter: Defines job status polling interval when waiting for jobs to complete upon startup.

© Copyright 2024, NVIDIA. Last updated on May 6, 2024.