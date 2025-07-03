Rev 3.9.0

log_verbosity sharp_am Sets the sharp_am log verbosity. The default value is modified from Warning level to Info level.

ib_sa_key sharp_am Parameter is no longer supported. Used to control the SA Key that sharp_am was using. This information was provided automatically from the SM.

Rev 3.8.0

reservation_max_jobs_per_hca sharp_am New parameter: A numeric parameter. Tells t he maximum number of allowed jobs that can use the same HCA . A value of 0 means no limit. Valid range: 0-511. Applies only while operating in reservation mode. Default: 1 job per HCA.

dynamic_tree_algorithm sharp_am Sets which algorithm should be used by the dynamic tree mechanism. M odified value 1 to include support for DragonFly topologies. Current values: 0 - Regular FatTree oriented algorithm 1 - Quasi Fat Tree or DragonFly oriented algorithm

Rev 3.7.0

rdma_sr_enable sharp_am New parameter: A boolean parameter. Tells whether sharp_am should provide its own service record via rdmacm service, enabling libsharp to find sharp_am even when a security QKey is enabled. Default: True.

telemetry_interval sharp_am New parameter: A decimal parameter. Tells the interval in seconds between sharp_am telemetry updates. A value of 0 means no telemetry reports. Valid range of values: 0, 10-3600 Default: 60 seconds.

telemetry_file_path sharp_am New parameter: A string parameter. Tells the full path of the sharp_am telemetry file output. An empty path or (null) means no telemetry reports. Default in UFM: /opt/ufm/log/sharp_am_telemetry.dump Default in non UFM systems: (null)

smx_sock_addr_family sharp_am Determines which address family will be used by SMX's sockets. New option is added, the current possible options are: auto, ipv4, ipv6 . The new " auto " option means that both IPv4 and IPv6 can be used if applicable, and if only one of them is configured on the management host, then the configured address will be used. Default: auto.

SHARP_SMX_SOCK_ADDR_FAMILY libsharp Parameter Removed. This environment variable controlled the socket address family that libsharp used (IPv4/IPv6). The parameter is removed, since now the selection is automatic, according to the sharp_am supported address family.

SHARP_USE_USER_QKEY libsharp New parameter: A boolean parameter. Tells whether libsharp should use user QKey for MAD QPs. In case that a compute node is configured with security qkey enabled, then sharp should use a user Qkey and this environment variable should be set to true . Default: False

SHARP_SR_QUERY_SOURCE libsharp New parameter: Defines the source that should be used in order to fetch the sharp_am service record. Possible values: 0 - Fetch only from the SA (opensm), this was the only supported option before sharp version 3.7.0. 1 - Fetch only from Sharp_am itself (requires that sharp_am is configured with rdma_sr_enable = true ). 2 - Try both options, try first from SA (OpenSM) and if not successful, try from Sharp_am. Default: 2 - Try both options.

Rev 3.5.0

log_categories_file Sharp_am Added support for a new string parameter which enables indicating the log categories file path. The value "(NULL)" indicates that the log categories file does not exist. Default: In UFM, the default path is: /opt/ ufm/files/conf/fabric_log_categories.cfg

Rev 3.3.0

dynamic_tree_algorithm sharp_am New parameter: Sets which algorithm should be used by the dynamic tree mechanism. This parameter is ignored when dynamic_tree_allocation is false. Possible values: 0 - SuperPOD oriented algorithm 1 - Quasi Fat Tree oriented algorithm Default: 0 – SuperPOD oriented algorithm

app_resources_default_limit sharp_am Sets the default max number of trees allowed to be used in parallel by a single app. Modified the possible range of values where the value of –1 means no resource limit, and 0 means no resources by default. Default: -1 – No resource limit

max_quota sharp_am Deprecated parameter: This parameter is now marked as deprecated. It is ignored and should not be used.

default_quota sharp_am Deprecated parameter: This parameter is now marked as deprecated. It is ignored and should not be used.

SHARP_SYSLOG_VERBOSITY libsharp New parameter: Sets the libsharp syslog verbosity level. Possible values: 0 – Disable syslog 1 – Errors log level 2 – Warnings log level 3 – Info log level Default: 1 – Errors log level

SHARP_GROUP_JOIN_MAD_TIMEOUT libsharp Sets the timeout till a retry for GroupJoin MAD, in milliseconds. Modified the default value. Default: 3000 milliseconds

SHARP_GROUP_JOIN_MAD_RETRIES libsharp Sets the number of retries for GroupJoin MAD. Modified the default value. Default: 5 retries

SHARP_QP_CONFIRM_MAD_TIMEOUT libsharp Sets the timeout till a retry for QP Allocation confirmation MAD, in milliseconds. Modified the default value. Default: 2000 milliseconds

Rev 3.2.0

ignore_host_guids_file sharp_am New parameter: File with a list of Host GUIDs to be ignored for SHARP trees. Default: Null.

ignore_sm_guids sharp_am New parameter: A boolean parameter, telling whether SM GUIDs need to be ignored in SHARP trees parsed from SMDB file. Default: True.

ftree_ca_order_file sharp_am Deprecated parameter: This parameter is now marked as deprecated, it is ignored and should not be used.

enable_sat sharp_am Deprecated parameter: This parameter controlled whether SHARP should allow SAT jobs. The parameter is now marked as deprecateI. it is ignored and should not be used. SAT is always supported.

SHARP_COLL_SERIALIZE_MADS libsharp New parameter: Serialize sharp MADs in tree connect and group join operations, it is recommended to set this flag to true when running mpirun with multiple groups. Default: False.

SHARP_COLL_JOB_REQUEST_RMC libsharp New parameter: If set to True, require that any allocated SHARP trees will support the Reliable Multicast feature. Default: False.

SHARP_COLL_FORCE_BCAST_AS_ALLREDUCE libsharp New parameter: Force Bcast(rmc) as Allreduce operation Default: False.

Rev 3.1.1 LTS

clean_and_exit sharp_am New parameter: A boolean parameter. When set to TRUE, sharp_am does not operate normally, but instead cleans SHARP resources from all switches and exits. Default: False - Operate normally.

Rev 3.1.0

dynamic_tree_allocation sharp_am New parameter: A boolean parameter, tells whether trees should be allocated dynamically for each SHARP job or have trees allocated during sharp_am initialization. Default: False

max_trees_to_build sharp_am Update: In case dynamic_tree_allocation is set to True, this parameter will have no effect on the number of trees allocated; sharp_am would determine that value based on the amount of possible trees the switches can have. However, in the dynamic trees mode, this parameter affects the number of skeleton trees that sharp_am will use. It is recommended that the minimal value be the same as the number of root switches in the fabric. In case dynamic_tree_allocation is set to False, this parameter can be used to fulfil its purpose. Default:

SHARP_COLL_IB_TIMEOUT libsharp New parameter: Transport timeout on SHARP QP Default: 18

SHARP_COLL_IB_RETRY_COUNT libsharp New parameter: Transport retries on SHARP QP Default: 7

SHARP_COLL_IB_RNR_TIMER libsharp New parameter: RNR timeout on SHARP QP Default: 12

SHARP_COLL_IB_RNR_RETRY libsharp New parameter: RNR retries on SHARP QP Default: 7

SHARP_COLL_IB_SL libsharp New parameter: SL Default: 0

SHARP_COLL_ENABLE_MCAST_TARGET libsharp Update: Modified the default value from True to False. Default: False

Rev 3.0.0

per_prio_default_quota sharp_am Update: This parameter controls only the default percentage provided to LLT jobs. Its default value is modified from 3 to 20

per_prio_default_sat_quota sharp_am New parameter: Default percentage of quota (OSTs, Buffers and Groups) per aggregation node per tree, to be requested for a single SAT job by its priority. If no explicit quota request is submitted, this parameter will set the quota percentage to be used. Format: prio_0_quota, [prio_1_quota, ..., prio_9_quota] Note that if only one value is set, it will be applied to all priorities. Default: 3

sat_jobs_default_absolute_osts sharp_am New parameter: Default number of OSTs to be allocated for SAT jobs per aggregation node per tree. Zero value means that no absolute value should be used, and the default percentage value is used instead. Note that the number of OSTs also affects the number of groups. Default: 0

app_resources_default_limit sharp_am New parameter: A numerical parameter, applicable only when reservation_mode is set to true. Sets the default max number of trees allowed to be used in parallel by a single app. This default value can be overridden per app upon reservation request. A value of 0 means no allowed resources, which means an app cannot execute any sharp job. Default: 1

force_app_id_match sharp_am New parameter: A boolean parameter, applicable only when reservation_mode is set to true. When set to true, an application ID must be provided upon job request, and it must match the application ID provided upon reservation request. Otherwise, the job will be denied. Default: False

SHARP_COLL_JOB_REQ_EXCLUSIVE_LOCK_MODE libsharp Update: Changed default value from 0 (no exclusive lock) to 2 (force exclusive lock)

Rev 2.7.0

recovery_retry_interval sharp_am New parameter: A timeout in seconds for trees recovery retries. A value of 0 means do not try to recover trees. Default: 300

enable_seamless_restart sharp_am New parameter: A boolean flag. If enabled, AM tries to recover state from last AM run and continue the operation of the current jobs. Default: True

seamless_restart_trees_file sharp_am New parameter: Set the SHARP trees file used in Seamless restart. Need to mention only the file name, full path is constructed using ‘dump_dir’. Default: sharp_am_trees_structure.dump

seamless_restart_max_retries sharp_am New parameter: Set the number of consecutive retries of seamless restart. If seamless restart fails more times in a row, it will be disabled in the next run. Default: 3

max_tree_radix sharp_am Update: Change default to 252

Ib_sat_max_mtu sharp_am Update: Change default to 5, to support MAD value that represents 4K MTU.

per_prio_default_quota sharp_am Update: Changed default to 3 instead of 20, enabling more SAT jobs to take place in parallel on each switch.

Rev 2.6.1

dump_dir sharp_am Update: Changed default to /var/log

smx_enabled_protocols sharp_am Update: Changed default from 7 to 6 (disable UCX by default)

ib_mad_timeout sharp_am Update: Change deault from 200 to 500

dump_dir sharp_am Update: Change default to /var/log

sr_mad_timeout sharpd New parameter: Control timeout for ServiceRecord queries Default: 10000 millieconds

sr_mad_retries sharpd New parameter: Control number of retries for ServiceRecord queries Default: 3 retires

Rev 2.5.0

smx_keepalive_interval sharp_am/sharpd New parameter: Keep alive interval in seconds 0 to disable keep alive.Default: 60 seconds

smx_incoming_conn_keepalive_interval sharp_am New parameter: Keep alive interval for incoming connections 0 to disable Default: 300 seconds

enable_exclusive_lock sharp_am New parameter: Enable/Disable exclusive lock feature. Default: True

enable_topology_api sharp_am New parameter: Enable/Disable Toplogy API feature Default: True

max_trees_to_build sharp_am New parameter: Control number of trees for AM to build Default: 126

Rev 2.4.3

ib_max_mads_on_wire sharp_am Modified behavior: Changed default from 100 to 4096

ib_qpc_local_ack_timeout sharp_am Modified behavior: Changed default from 0x1F to 0x12

ib_sat_qpc_local_ack_timeout sharp_am Modified behavior: Changed default from 0x1F to 0x12

ib_qpc_timeout_retry_limit sharp_am Modified behavior: Changed default from 7 to 6

ib_sat_qpc_timeout_retry_limit sharp_am Modified behavior: Changed default from 7 to 6

Rev 2.0.0

control_path_version sharp_am New parameter Default

max_compute_ports_per_agg_node sharp_am Modified behavior: When set to 0, AN radix is set to maximal radix value. Default: 0

default_reproducibility sharp_am New parameter: Control default reproducibility mode for jobs. Default: TURE

ib_sa_key sharp_am New parameter: Control SA key for SA operations. Default: 0x1

coll_job_quota_max_payload_per_ost sharp_job_quota Modified behavior: Change default value to 1024.

SHARP_COLL_MAX_PAYLOAD_SIZE Libsharp_coll Removed

SHARP_COLL_NUM_SHARP_COLL_REQ Libsharp_coll Removed

SHARP_COLL_ENABLE_REPRODUCIBLE_MODE Libsharp_coll New parameter: Control job reproducibility mode: 0 – Use default. 1 – No reproducibility. 2 – Reproducibility.

SHARP_COLL_ENABLE_CUDA Libsharp_coll New parameter: Enables CUDA GPU direct.

SHARP_COLL_ENABLE_GPU_DIRECT_RDMA Libsharp_coll New parameter: Enables GPU direct RDMA.

Rev 1.8.1

pending_mode_timeout sharp_am New parameter: Defines AM waiting time for jobs to complete prior to fabric re-initialization upon startup.