Release Notes Change History
Feature/Change | Description |
Rev 2.6.1 | |
General | Added support for running libsharp_coll from SHARP 2.6.1 with SHARPD from SHARP 2.4.0 – 2.6.1 |
General | Added information about updatable configuration parameters in the configuration file and help menu |
Network | Added support for keep-alive on connections to SHARPD |
Network | Added support for asynchronous connections |
Network | Disabled UCX listener as default in SHARP Aggregation Manager |
AM | Added support for the non-default subnet prefix |
AM | Added support for DF+ topologies with more than two-level islands |
SHARPD | Added support for caching AM address |
Rev 2.5.0 | |
Resource Management | Added support for exclusive lock requests for streaming aggregation jobs. |
Network | Enabled connection keep-alive between SHARPD and Aggregation Manager. |
Rev 2.4.3 | |
General | Added support for identifying Aggregation Nodes based on SMDB. |
General | Improved minhop tables calculation. |
General | Added a new API for querying events. |
Rev 2.1.4 | |
sharp_am/sharpd/libsharp_coll: Streaming Aggregation | Added support for Streaming Aggregation over ConnectX-6 adapter card and Quantum switch. |
libsharp_coll: GPU Accelerator | Added support for NVIDIA GPU buffers. |
sharp_am: OOB | Added support for identifying the topology type from the OpenSM SMDB file. |
sharp_am: Reboot | Fixed an issue where recovery failed after reboot of all switches in the cluster. |
Rev 2.0.0 | |
sharp_am/sharpd/libsharp_coll | Added support for the following NVIDIA Quantum switch capabilities:
|
sharp_am/sharpd: Resource Management | Added support for enabling and disabling reproducibility on the job level. |
sharp_am/sharpd: Subnet Management | Added support for controlling the SA key for SA operations. |
libsharp_coll: GPUDirect | Added support for CUDA GPUDirect and GPUDirect RDMA. |
Rev 1.8.1 | |
Aggregation Manager (sharp_am): Resiliency | Added support for waiting for jobs to end prior to performing fabric reinitialization on AM startup. |
Mellanox SHARP Daemon (sharpd): Out-of-Box Improvements | Socket-based is now activated by default when installed from RPM/MLNX_OFED. |
Parameter | Component | Description |
Rev 2.6.1 | ||
dump_dir | sharp_am | Update: Changed default to /var/log |
smx_enabled_protocols | sharp_am | Update: Changed default from 7 to 6 (disable UCX by default) |
ib_mad_timeout | sharp_am | Update: Change deault from 200 to 500 |
dump_dir | sharp_am | Update: Change default to /var/log |
sr_mad_timeout | sharpd | New parameter: Control timeout for ServiceRecord queries Default: 10000 millieconds |
sr_mad_retries | sharpd | New parameter: Control number of retries for ServiceRecord queries Default: 3 retires |
Rev 2.5.0 | ||
smx_keepalive_interval | sharp_am/sharpd | New parameter: Keep alive interval in seconds 0 to disable keep alive.Default: 60 seconds |
smx_incoming_conn_keepalive_interval | sharp_am | New parameter: Keep alive interval for incoming connections 0 to disable Default: 300 seconds |
enable_exclusive_lock | sharp_am | New parameter: Enable/Disable exclusive lock feature. Default: True |
enable_topology_api | sharp_am | New parameter: Enable/Disable Toplogy API feature Default: True |
max_trees_to_build | sharp_am | New parameter: Control number of trees for AM to build Default: 126 |
Rev 2.4.3 | ||
ib_max_mads_on_wire | sharp_am | Modified behavior: Changed default from 100 to 4096 |
ib_qpc_local_ack_timeout | sharp_am | Modified behavior: Changed default from 0x1F to 0x12 |
ib_sat_qpc_local_ack_timeout | sharp_am | Modified behavior: Changed default from 0x1F to 0x12 |
ib_qpc_timeout_retry_limit | sharp_am | Modified behavior: Changed default from 7 to 6 |
ib_sat_qpc_timeout_retry_limit | sharp_am | Modified behavior: Changed default from 7 to 6 |
Rev 2.0.0 | ||
control_path_version | sharp_am | New parameter |
max_compute_ports_per_agg_node | sharp_am | Modified behavior: When set to 0, AN radix is set to maximal radix value. Default: 0 |
default_reproducibility | sharp_am | New parameter: Control default reproducibility mode for jobs. Default: TURE |
ib_sa_key | sharp_am | New parameter: Control SA key for SA operations. Default: 0x1 |
coll_job_quota_max_payload_per_ost | sharp_job_quota | Modified behavior: Change default value to 1024. |
SHARP_COLL_MAX_PAYLOAD_SIZE | Libsharp_coll | Removed |
SHARP_COLL_NUM_SHARP_COLL_REQ | Libsharp_coll | Removed |
SHARP_COLL_ENABLE_REPRODUCIBLE_MODE | Libsharp_coll | New parameter: Control job reproducibility mode: 0 – Use default. 1 – No reproducibility. 2 – Reproducibility. |
SHARP_COLL_ENABLE_CUDA | Libsharp_coll | New parameter: Enables CUDA GPU direct. |
SHARP_COLL_ENABLE_GPU_DIRECT_RDMA | Libsharp_coll | New parameter: Enables GPU direct RDMA. |
Rev 1.8.1 | ||
pending_mode_timeout | sharp_am | New parameter: Defines AM waiting time for jobs to complete prior to fabric re-initialization upon startup. |
job_info_polling_interval | sharp_am | New parameter: Defines job status polling interval when waiting for jobs to complete upon startup. |