SHARP Application Awareness

NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.4.0

Different entities, such as tenants, jobs, and others - all considered "applications", can be bound together through SHARP. In other words, SHARP can be application-aware, providing isolation and a set of attributes to each application.

The most common example related to SHARP is an application being a SLURM job, created to perform a certain task.

To be application-aware, the following conditions should be met:

  • sharp_am should run with config parameter reservation_mode set to TRUE

  • sharp_am should operate from within UFM, as UFM REST-API use is a must to operate in this mode

Once sharp_am operates in reservation_mode, no compute host is allowed to ask for a SHARP job, unless it was specifically requested via UFM REST-API.

The REST-API enables to define a set of hosts that function as a single application, with an option to define also a pkey that they share and a limit of resources that can be used by the app.

With this method, the admin of the fabric can control which compute hosts are allowed to leverage SHARP, and can even limit the number of trees allocated per application. By default, once an application is declared, there is no limit for the number of trees it can allocate. In case a limitation is required, it is advised that the minimum value be the same as the number of rails in the system.

Full details of the REST-API can be found in UFM REST-API document.

© Copyright 2023, NVIDIA. Last updated on Nov 7, 2023.