NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.8.0
NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.8.0

SHARP Reservation

Different entities, such as tenants, applications, jobs, and any desired group of nodes, can be bound together through SHARP reservation. In other words, SHARP reservation is a method for providing isolation and a set of attributes to each desired group of nodes.

In a public cloud system, this can be used to define a tenant. With the settings of a pkey (see Operating NVIDIA SHARP with PKeys), each tenant can run SHARP jobs for their applications. Different tenants' applications cannot impact one another, and the cloud admin can control which tenant can use SHARP. For simplicity, in this document, we address the use case of tenants and refer to the customer applications as running inside a tenant, although the reservation mechanism can serve other types of groups.

Note

SHARP reservation feature is also called "SHARP allocation" in some APIs and configuration parameters.

To use the reservation capabilities, the following conditions should be met:

  • sharp_am should operate from within UFM, as UFM REST-API use is a must to operate in this mode.

  • In UFM configuration file gv.cfg, the parameter enable_sharp_allocation must be set to True.

Once sharp_am is in allocation mode, compute hosts cannot request a SHARP job unless it was specifically requested through the UFM REST-API.

The REST-API allows for the creation, updating, and deletion of tenants (reservations), with the option to define a related pkey and set a limit on the SHARP resources available to the tenant.

With this method, the fabric admin can control which compute hosts are allowed to leverage SHARP, and can even limit the number of trees allocated per tenant.

Full details of the REST-API can be found in the NVIDIA UFM Enterprise REST API guide.

Limiting the amount of SHARP resources per tenant is crucial in a multi-tenant system in order to prevent it from consuming resources and impacting other tenants.

To provide a fair use of the SHARP resources, the default configuration defines the following limits:

  1. A tenant can run multiple SHARP jobs in parallel.

  2. Two SHARP jobs cannot share the same HCA.

  3. There is no explicit limit on the total number of SHARP jobs a tenant can run. However, since there is a limit per HCA and each SHARP job requires at least 2 HCAs, the effective limit on the total number of jobs is half the number of HCAs available to the tenant.

The limits described ensure a fair use of the resources and guarantee that, in a non-blocking topology, no tenant can allocate resources in a way that impacts another tenant's available resources.

You can set an explicit limit on the total number of jobs using the app_resources_default_limit configuration parameter (which applies to all tenants) or through the UFM REST-API (which allows different values for individual tenants).

Additionally, you can adjust the limit of jobs per HCA using the reservation_max_jobs_per_hca parameter, which affects all tenants.

© Copyright 2024, NVIDIA. Last updated on Aug 13, 2024.