What can I help you with?
NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.10.3

Changes and New Features

Feature/Change

Description

XDR Support

SHARP now supports topologies utilizing ConnectX-8 network interfaces and BlackMamba Quantum-3 switches.

Switch In-Service Software Upgrade (ISSU) Support

SHARP now supports the In-Service Software Upgrade (ISSU) process, allowing switch firmware upgrades without requiring a restart. This ensures that upgrades can be performed without impacting active SHARP jobs.

Expanded REST API for SHARP Allocation

The SHARP Allocation REST API now includes the following two new commands, enabling the addition or removal of host GUIDs without needing to specify the full list of existing host GUIDs:

  • add_guids

  • remove_guids

Non-Root Operation in UFM

The sharp_am service can now run with ufmapp privileges instead of requiring superuser access when operating within UFM.

Enhanced Logging for SHARP Jobs

Libsharp can now print statistics on aggregated data at the end of a SHARP job.

Parameter

Component

Description

smx_enabled_protocols

sharp_am

Parameter Removed.

This parameter defined the protocols enabled by SMX (Sockets, UCX, and Unix Domain Socket).

Unix Domain Socket is now always enabled, and Sockets and UCX no longer need to be enabled simultaneously—only one is required.

The choice between Sockets and UCX is now controlled by the existing smx_protocol configuration parameter.

SHARP_SMX_SOCK_ADDR_FAMILY

libsharp

New parameter: A string, defines whether libsharp should communicate via IPv4, IPv6 or make an automatic decision.

This environment variable is relevant only when sockets are used for SMX communication.

Valid values: ipv4, ipv6, auto

Default: auto.

SHARP_ COLL_STATS_DUMP_MODE

libsharp

New parameter: An enum value, defines whether libsharp should print statistics at the end of a job.

Valid values:

0 - Do not print stats.

1- Print stats of rank 0.

2 - Print stats of all the processes.

Note: The stats are printed at the same place, but will include information about all the participating processes).

Default: 0 - Do not print stats.

SHARP_ COLL_STATS_FILE

libsharp

New parameter: A string, telling the destination of the printed stats.

Valid values:

  • stdout - Print to standard output.

  • stderr - Print to standard error.

  • file:<filename> - Save to a file by the <filename>. Escape characters can be used in the file name: %h: host, %p: pid, %t: time, %u: user, %e: exe.

Default: stdout - Print to standard output.

© Copyright 2025, NVIDIA. Last updated on Mar 16, 2025.