Running Mellanox SHARPD Daemon in Managed Mode
When running the daemon in a managed mode, it expects communication from the prolog/epilog
scripts of the Job Scheduler (JS). The prolog/epilog scripts should invoke the “sharp_job_quota
” executable to communicate with Mellanox SHARP.
To run SHARPD in managed mode, use the “mgmt_mode
” option (default: 0 – run in “unmanaged” mode).
JS can set/unset upper limit for Mellanox SHARP resources (e.g OSTs, groups and etc.) allowed for a particular user/job via sharp_job_quota
using the “set
” and “remove
” commands.
Usage
sharp_job_quota [OPTIONS]
sharp_job_quota option
Option | Required/Optional | Arguments | Description |
---|---|---|---|
| Required | set / | Sets or removes quota |
| Required | Unique numeric 64-bit ID | This is the scheduler id for the job. No other job in the system at the same time can have the same id |
| Optional | Numeric | UID of the user allowed to run the job |
| Optional | string | Name of the user allowed to run the job |
| Optional | Numeric value: 0..256 | Maximum number of Mellanox SHARP groups (communicators) allowed. Default value: 0. |
| Optional | Numeric value: 0..256 | Maximum QPs/port allowed. |
| Optional | Numeric value: 0..1024 | Maximum payload per OST allowed. |
| Optional | Numeric value: 0..512 | Indicates the maximum number of OSTs allowed for job per collective operation. |
| Optional | Numeric Value: 0..4 | Indicates the maximum number of trees allowed for the job. |
| Optional | Numeric value 0..9 | Indicates priority of the job. |
| Optional | Number value 0..100 | Indicates percentage of resources to request for the job. |
Important Notes
- The executable needs to run with the same user as the SD (root)
- When using the “set” operation, either the uid or the user_name must be provided
- Regardless of the job quota set in prolog, the AM can allocate less resources than requested or decline the request
Examples
# sharp_job_quota --operation set --user_name jobrunner --allocation_id 2017 --coll_job_quota_max_groups 10 # sharp_job_quota --operation remove --allocation_id 2017
SLURM Examples
#sharp_job_quota --operation set --uid $SLURM_JOB_UID --allocation_id $SLURM_JOB_ID #sharp_job_quota --operation remove --allocation_id $SLURM_JOB_ID