2.8.1. Setup and Management


Describes APIs for setting up policies and registering callbacks to receive notification in case specific policy condition has been violated.


dcgmReturn_t dcgmPolicyGet ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, int  count, dcgmPolicy_t* policy, dcgmStatus_t statusHandle )
dcgmReturn_t dcgmPolicyRegister ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyCondition_t condition, fpRecvUpdates beginCallback, fpRecvUpdates finishCallback )
dcgmReturn_t dcgmPolicySet ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicy_t* policy, dcgmStatus_t statusHandle )
dcgmReturn_t dcgmPolicyUnregister ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyCondition_t condition )


dcgmReturn_t dcgmPolicyGet ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, int  count, dcgmPolicy_t* policy, dcgmStatus_t statusHandle )
IN: DCGM Handle
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
IN: The size of the policy array. This is the maximum number of policies that will be retrieved and ultimately should correspond to the number of GPUs specified in the group.
OUT: A reference to dcgmPolicy_t that will used as storage for the current policies applied to each GPU in the group.
IN/OUT: Resulting status for the operation. Pass it as NULL if the detailed error information for the operation is not needed. Refer to dcgmStatusCreate for details on creating a status handle.


Get the current violation policy inside the policy manager. Given a groupId, a number of policy structures are retrieved.

dcgmReturn_t dcgmPolicyRegister ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyCondition_t condition, fpRecvUpdates beginCallback, fpRecvUpdates finishCallback )
IN: DCGM Handle
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
IN: The set of conditions specified as an OR'd list (see dcgmPolicyCondition_t) for which to register a callback function
IN: A reference to a function that should be called should a violation occur. This function will be called prior to any actions specified by the policy are taken.
IN: A reference to a function that should be called should a violation occur. This function will be called after any action specified by the policy are completed.


Register a function to be called when a specific policy condition (see dcgmPolicyCondition_t) has been violated. This callback(s) will be called automatically when in DCGM_OPERATION_MODE_AUTO mode and only after dcgmPolicyTrigger when in DCGM_OPERATION_MODE_MANUAL mode. All callbacks are made within a separate thread.

This API is only supported on Tesla GPUs and will return DCGM_ST_NOT_SUPPORTED if any non-Tesla GPUs are part of the GPU group specified in groupId.

dcgmReturn_t dcgmPolicySet ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicy_t* policy, dcgmStatus_t statusHandle )
IN: DCGM Handle
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
IN: A reference to dcgmPolicy_t that will be applied to all GPUs in the group.
IN/OUT: Resulting status for the operation. Pass it as NULL if the detailed error information is not needed. Refer to dcgmStatusCreate for details on creating a status handle.


Set the current violation policy inside the policy manager. Given the conditions within the dcgmPolicy_t structure, if a violation has occurred, subsequent action(s) may be performed to either report or contain the failure.

This API is only supported on Tesla GPUs and will return DCGM_ST_NOT_SUPPORTED if any non-Tesla GPUs are part of the GPU group specified in groupId.

dcgmReturn_t dcgmPolicyUnregister ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyCondition_t condition )
IN: DCGM Handle
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
IN: The set of conditions specified as an OR'd list (see dcgmPolicyCondition_t) for which to unregister a callback function


Unregister a function to be called for a specific policy condition (see dcgmPolicyCondition_t). This function will unregister all callbacks for a given condition and handle.