1.8.2. Manual Invocation

[Policies]

Describes APIs which can be used to perform direct actions (e.g. Perform GPU Reset, Run Health Diagnostics) on a group of GPUs.

Functions

dcgmReturn_t dcgmActionValidate ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyValidation_t validate, dcgmDiagResponse_t* response )
dcgmReturn_t dcgmActionValidate_v2 ( dcgmHandle_t pDcgmHandle, dcgmRunDiag_v7* drd, dcgmDiagResponse_t* response )
dcgmReturn_t dcgmRunDiagnostic ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmDiagnosticLevel_t diagLevel, dcgmDiagResponse_t* diagResponse )

Functions

dcgmReturn_t dcgmActionValidate ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPolicyValidation_t validate, dcgmDiagResponse_t* response )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
validate
IN: The validation to perform after the action.
response
OUT: Result of the validation process. Refer to dcgmDiagResponse_t for details.
Returns

Description

Inform the action manager to perform a manual validation of a group of GPUs on the system

*************************************** DEPRECATED ***************************************

dcgmReturn_t dcgmActionValidate_v2 ( dcgmHandle_t pDcgmHandle, dcgmRunDiag_v7* drd, dcgmDiagResponse_t* response )
Parameters
pDcgmHandle
IN: DCGM Handle
drd
IN: Contains the group id, test names, test parameters, struct version, and the validation that should be performed. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
response
OUT: Result of the validation process. Refer to dcgmDiagResponse_t for details.
Returns

Description

Inform the action manager to perform a manual validation of a group of GPUs on the system

dcgmReturn_t dcgmRunDiagnostic ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmDiagnosticLevel_t diagLevel, dcgmDiagResponse_t* diagResponse )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
diagLevel
IN: Diagnostic level to run
diagResponse
IN/OUT: Result of running the DCGM diagnostic. .version should be set to dcgmDiagResponse_version before this call.
Returns

Description

Run a diagnostic on a group of GPUs