Managing Power Capping#

The GPU has three sources of power limits:

  • VBIOS: defines the maximum possible TGP (Total Graphics Power) value.

  • The nvidia-smi tool: sets the power limit of the GPU through the host by users.

  • SMBPBI: sets the power limit of the GPU via an out-of-band channel.

The GPU Performance Monitoring Unit (PMU) selects the most conservative policy to cap a system’s power consumption.

Querying the Current GPU Power Limit#

Use the following curl command to query the current GPU power limit:

curl  -k -u <username>:<password> https://<bmc>/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_<id>/EnvironmentMetrics

Where

  • <bmc> is the BMC IP address.

  • <id> is the GPU instance number of 1 to 8.

As shown in the following example output, the Reading field indicates the current power usage, and the SetPoint field indicates the current GPU power limit.

...

"PowerLimitWatts": {
        "AllowableMax": 1000,
        "AllowableMin": 200,
        "ControlMode": "Automatic",
        "DefaultSetPoint": 700,
        "Reading": 64.388,
        "SetPoint": 700
}
...

Managing N+N Configuration (IPMI)#

By default, a system will boot with three power supplies. To achieve the safe operation of an N+N configuration, you need to enable the power capping feature to limit the power consumed by the system.

Get the System Power Limit

ipmitool raw 0x3c 0x80 0x05

The format of the response is c8 32. To convert this value:

(0xc8 + 0x32 << 8) = 0x32c8 = 13000

If the feature is disabled, a value greater than 12,000 is returned.

Enable PSU Redundancy Support

To enable the PSU redundancy feature, set the power budget limit outside the actual system budget. The following example sets the power budget to 12 kW.

ipmitool raw x3c 0x81 0x05 0xE0 0x2E  //Set 12 kW

Enable Power Capping Support

To operate the system lower than the maximum power budget the PSU can support, set a limit lower than:

ipmitool raw x3c 0x81 0x05 <MSB> <LSB>

The following example sets a limit of 6 KW (0x1770):

ipmitool raw 0x3c 0x81 0x05 0x70 0x17

Querying the GPU power limit using Redfish API shows 562 W:

...

"PowerLimitWatts": {
        "AllowableMax": 1000,
        "AllowableMin": 200,
        "ControlMode": "Automatic",
        "DefaultSetPoint": 700,
        "Reading": 64.335,
        "SetPoint": 562
}
...

Managing Power Capping Using Redfish API#

To manage a system’s maximum power consumption through power capping using Redfish API, refer to Querying GPU Power Limit and Power Capping.