dpsctl gpu-policy
dpsctl gpu-policy Usage Guide
Set GPU power policies on of an active resource group.
Note: Today, when GPU power policies are applied, their values are summed into a single node-level GPU policy. This means that the applied set limit per GPU will differ slightly from the user’s specified input values.
Usage
dpsctl [global options] gpu-policy [options]Flags
Includes global dpsctl options.
--node value (can be multiple) nodeName=gpu0,gpu1,gpu2,...
--help, -h show helpExamples
Basic Usage
The sum of all GPU policy values plus the COMP_CPU and COMP_MEMORY components of the resource group default or entity policy
must sum to be no less than the capability minimum of the default or entity node policy.
In this example, we have previously created and activated a resource group, example1, with a default Node-High policy, and
three nodes: node001, node002, and node003. We want to set the GPU policy of node001 to the following values:
- GPU0: 500W
- GPU1: 550W
- GPU2: 600W
- GPU3: 700W
- GPU4: 650W
- GPU5: 700W
- GPU6: 550W
- GPU7: 600W
By looking at the Node-High policy (enforced on node001), we can see that the minimum node power budget is set to 5,600W, with a maximum of 10,200W.
Additionally, our limits for COMP_CPU and COMP_MEMORY are 1,530W and 1,020W, respectively.
By summing our limits together, we can see that:
500W + 550W + 600W + 700W + 650W + 700W + 550W + 600W + 1,530W + 1,020W = 7410W
which is within our node budget and is therefore a valid power configuration for node001.
Now we can apply our updates GPU policy with dpsctl:
$ dpsctl gpu-policy --node node001=500,550,600,700,650,700,550,600
{
"results": [
{
"resource_name": "node001",
"gpu_id": 0,
"ok": true,
"set_limit": 500.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 1,
"ok": true,
"set_limit": 550.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 2,
"ok": true,
"set_limit": 600.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 3,
"ok": true,
"set_limit": 700.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 4,
"ok": true,
"set_limit": 650.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 5,
"ok": true,
"set_limit": 700.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 6,
"ok": true,
"set_limit": 550.0,
"diag_msg": "Success"
},
{
"resource_name": "node001",
"gpu_id": 7,
"ok": true,
"set_limit": 600.0,
"diag_msg": "Success"
}
],
"status": {
"ok": true,
"diag_msg": "Success"
}
}