NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.1.1 LTS
1.0

Release Notes Change History

Feature/Change

Description

Rev 3.1.0

Aggregation Manager (AM)

Addedsupport for dynamic creation of treesinstead of static allocation whenSHARPis initialized.

Rev 3.0.1

Bug Fixes

See Bug Fixes section.

Rev 3.0.0

General

Added support for executing multiple jobs that aggregate data through the same set of switches, while each job utilizes a different set of links.

SHARP logic is now application-aware with UFM capabilities. SHARP jobs can be assigned an App-ID, which can be used as a reference to the customer application performing these jobs.

For further information, please refer to UFM SLURM Integration Appendix in UFM UM.

Added the option to limit the SHARP resources that applications are allowed to consume.

For further information, please refer to UFM SLURM Integration Appendix in UFM UM.

AM

Modified the default resources provided to LLT & SAT jobs. This enables operation of a larger amount of SAT jobs in parallel to few LLT jobs (please see the first three entries in the table below).

libsharp

SHARP jobs are now executed in exclusive lock mode by default (please see SHARP_COLL_JOB_REQ_EXCLUSIVE_LOCK_MODE in the table below).

Rev 2.7.0

Switches

Added support for NVIDIA Quantum-2 switches with NDR speed

Adapter Cards

Added support for NVIDIA ConnectX-7 adapter card with 400 Gb/s speed

SHARPD

sharpd daemon process has been removed. sharpd-related activity is now performed from the user application process

AM

Upon restart of AM, it no longer needs to wait for all concurrent jobs to finish before being able to accept new jobs

Added a mechanism that periodically checks for errors in Aggregation Trees and attempts to fix them

General

Added support for new data types BFLOAT16, INT8 and UNIT8 for performing reduction operations

Rev 2.6.1

General

Added support for running libsharp_coll from SHARP 2.6.1 with SHARPD from SHARP 2.4.0 – 2.6.1

General

Added information about updatable configuration parameters in the configuration file and help menu

Network

Added support for keep-alive on connections to SHARPD

Network

Added support for asynchronous connections

Network

Disabled UCX listener as default in SHARP Aggregation Manager

AM

Added support for the non-default subnet prefix

AM

Added support for DF+ topologies with more than two-level islands

SHARPD

Added support for caching AM address

Rev 2.5.0

Resource Management

Added support for exclusive lock requests for streaming aggregation jobs.

Network

Enabled connection keep-alive between SHARPD and Aggregation Manager.

Rev 2.4.3

General

Added support for identifying Aggregation Nodes based on SMDB.

General

Improved minhop tables calculation.

General

Added a new API for querying events.

Rev 2.1.4

sharp_am/sharpd/libsharp_coll: Streaming Aggregation

Added support for Streaming Aggregation over ConnectX-6 adapter card and Quantum switch.

libsharp_coll: GPU Accelerator

Added support for NVIDIA GPU buffers.

sharp_am: OOB

Added support for identifying the topology type from the OpenSM SMDB file.

sharp_am: Reboot

Fixed an issue where recovery failed after reboot of all switches in the cluster.

Rev 2.0.0

sharp_am/sharpd/libsharp_coll

Added support for the following NVIDIA Quantum switch capabilities:

  • Performing data operations on new data types (unsigned short, short, and short floating point data types)

  • 1K OST payload

sharp_am/sharpd: Resource Management

Added support for enabling and disabling reproducibility on the job level.

sharp_am/sharpd: Subnet Management

Added support for controlling the SA key for SA operations.

libsharp_coll: GPUDirect

Added support for CUDA GPUDirect and GPUDirect RDMA.

Rev 1.8.1

Aggregation Manager (sharp_am): Resiliency

Added support for waiting for jobs to end prior to performing fabric reinitialization on AM startup.

Mellanox SHARP Daemon (sharpd): Out-of-Box Improvements

Socket-based is now activated by default when installed from RPM/MLNX_OFED.

Parameter

Component

Description

Rev 3.1.0

dynamic_tree_allocation

sharp_am

New parameter: A boolean parameter, tells whether trees should be allocated dynamically for each SHARP job or have trees allocated during sharp_am initialization.

Default: False

max_trees_to_build

sharp_am

Update: In case dynamic_tree_allocation is set to True, this parameter will have no effect on the number of trees allocated; sharp_am would determine that value based on the amount of possible trees the switches can have. However, in the dynamic trees mode, this parameter affects the number of skeleton trees that sharp_am will use. It is recommended that the minimal value be the same as the number of root switches in the fabric.

In case dynamic_tree_allocation is set to False, this parameter can be used to fulfil its purpose.

Default:

SHARP_COLL_IB_TIMEOUT

libsharp

New parameter: Transport timeout on SHARP QP

Default: 18

SHARP_COLL_IB_RETRY_COUNT

libsharp

New parameter: Transport retries on SHARP QP

Default: 7

SHARP_COLL_IB_RNR_TIMER

libsharp

New parameter: RNR timeout on SHARP QP

Default: 12

SHARP_COLL_IB_RNR_RETRY

libsharp

New parameter: RNR retries on SHARP QP

Default: 7

SHARP_COLL_IB_SL

libsharp

New parameter: SL

Default: 0

SHARP_COLL_ENABLE_MCAST_TARGET

libsharp

Update: Modified the default value from True to False.

Default: False

Rev 3.0.0

per_prio_default_quota

sharp_am

Update: This parameter controls only the default percentage provided to LLT jobs. Its default value is modified from 3 to 20

per_prio_default_sat_quota

sharp_am

New parameter: Default percentage of quota (OSTs, Buffers and Groups) per aggregation node per tree, to be requested for a single SAT job by its priority.

If no explicit quota request is submitted, this parameter will set the quota percentage to be used.

Format: prio_0_quota, [prio_1_quota, ..., prio_9_quota]

Note that if only one value is set, it will be applied to all priorities.

Default: 3

sat_jobs_default_absolute_osts

sharp_am

New parameter: Default number of OSTs to be allocated for SAT jobs per aggregation node per tree.

Zero value means that no absolute value should be used, and the default percentage value is used instead.

Note that the number of OSTs also affects the number of groups.

Default: 0

app_resources_default_limit

sharp_am

New parameter: A numerical parameter, applicable only when reservation_mode is set to true. Sets the default max number of trees allowed to be used in parallel by a single app. This default value can be overridden per app upon reservation request.

A value of 0 means no allowed resources, which means an app cannot execute any sharp job.

Default: 1

force_app_id_match

sharp_am

New parameter: A boolean parameter, applicable only when reservation_mode is set to true. When set to true, an application ID must be provided upon job request, and it must match the application ID provided upon reservation request. Otherwise, the job will be denied.

Default: False

SHARP_COLL_JOB_REQ_EXCLUSIVE_LOCK_MODE

libsharp

Update: Changed default value from 0 (no exclusive lock) to 2 (force exclusive lock)

Rev 2.7.0

recovery_retry_interval

sharp_am

New parameter: A timeout in seconds for trees recovery retries. A value of 0 means do not try to recover trees.

Default: 300

enable_seamless_restart

sharp_am

New parameter: A boolean flag. If enabled, AM tries to recover state from last AM run and continue the operation of the current jobs.

Default: True

seamless_restart_trees_file

sharp_am

New parameter: Set the SHARP trees file used in Seamless restart. Need to mention only the file name, full path is constructed using ‘dump_dir’.

Default: sharp_am_trees_structure.dump

seamless_restart_max_retries

sharp_am

New parameter: Set the number of consecutive retries of seamless restart. If seamless restart fails more times in a row, it will be disabled in the next run.

Default: 3

max_tree_radix

sharp_am

Update: Change default to 252

Ib_sat_max_mtu

sharp_am

Update: Change default to 5, to support MAD value that represents 4K MTU.

per_prio_default_quota

sharp_am

Update: Changed default to 3 instead of 20, enabling more SAT jobs to take place in parallel on each switch.

Rev 2.6.1

dump_dir

sharp_am

Update: Changed default to /var/log

smx_enabled_protocols

sharp_am

Update: Changed default from 7 to 6 (disable UCX by default)

ib_mad_timeout

sharp_am

Update: Change deault from 200 to 500

dump_dir

sharp_am

Update: Change default to /var/log

sr_mad_timeout

sharpd

New parameter: Control timeout for ServiceRecord queries

Default: 10000 millieconds

sr_mad_retries

sharpd

New parameter: Control number of retries for ServiceRecord queries

Default: 3 retires

Rev 2.5.0

smx_keepalive_interval

sharp_am/sharpd

New parameter: Keep alive interval in seconds 0 to disable keep alive.Default: 60 seconds

smx_incoming_conn_keepalive_interval

sharp_am

New parameter: Keep alive interval for incoming connections 0 to disable

Default: 300 seconds

enable_exclusive_lock

sharp_am

New parameter: Enable/Disable exclusive lock feature.

Default: True

enable_topology_api

sharp_am

New parameter: Enable/Disable Toplogy API feature

Default: True

max_trees_to_build

sharp_am

New parameter: Control number of trees for AM to build

Default: 126

Rev 2.4.3

ib_max_mads_on_wire

sharp_am

Modified behavior: Changed default from 100 to 4096

ib_qpc_local_ack_timeout

sharp_am

Modified behavior: Changed default from 0x1F to 0x12

ib_sat_qpc_local_ack_timeout

sharp_am

Modified behavior: Changed default from 0x1F to 0x12

ib_qpc_timeout_retry_limit

sharp_am

Modified behavior: Changed default from 7 to 6

ib_sat_qpc_timeout_retry_limit

sharp_am

Modified behavior: Changed default from 7 to 6

Rev 2.0.0

control_path_version

sharp_am

New parameter
Default

max_compute_ports_per_agg_node

sharp_am

Modified behavior: When set to 0, AN radix is set to maximal radix value.

Default: 0

default_reproducibility

sharp_am

New parameter: Control default reproducibility mode for jobs.

Default: TURE

ib_sa_key

sharp_am

New parameter: Control SA key for SA operations.

Default: 0x1

coll_job_quota_max_payload_per_ost

sharp_job_quota

Modified behavior: Change default value to 1024.

SHARP_COLL_MAX_PAYLOAD_SIZE

Libsharp_coll

Removed

SHARP_COLL_NUM_SHARP_COLL_REQ

Libsharp_coll

Removed

SHARP_COLL_ENABLE_REPRODUCIBLE_MODE

Libsharp_coll

New parameter: Control job reproducibility mode:

0 – Use default.

1 – No reproducibility.

2 – Reproducibility.

SHARP_COLL_ENABLE_CUDA

Libsharp_coll

New parameter: Enables CUDA GPU direct.

SHARP_COLL_ENABLE_GPU_DIRECT_RDMA

Libsharp_coll

New parameter: Enables GPU direct RDMA.

Rev 1.8.1

pending_mode_timeout

sharp_am

New parameter: Defines AM waiting time for jobs to complete prior to fabric re-initialization upon startup.

job_info_polling_interval

sharp_am

New parameter: Defines job status polling interval when waiting for jobs to complete upon startup.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.