Process Statistics

group DCGMAPI_PROCESS_STATS

Describes APIs to investigate statistics such as accounting, performance and errors during the lifetime of a GPU process.

Functions

dcgmReturn_t dcgmWatchPidFields(dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, long long updateFreq, double maxKeepAge, int maxKeepSamples)

Request that DCGM start recording stats for fields that can be queried with dcgmGetPidInfo().

Note that the first update of the field will not occur until the next field update cycle. To force a field update cycle, call dcgmUpdateAllFields(1).

Parameters
  • pDcgmHandle – IN: DCGM Handle

  • groupId – IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.

  • updateFreq – IN: How often to update this field in usec

  • maxKeepAge – IN: How long to keep data for this field in seconds

  • maxKeepSamples – IN: Maximum number of samples to keep. 0=no limit

Returns

  • DCGM_ST_OK if the call was successful

  • DCGM_ST_BADPARAM if a parameter is invalid

  • DCGM_ST_REQUIRES_ROOT if the host engine is being run as non-root, and accounting mode could not be enabled (requires root). Run “nvidia-smi -am 1” as root on the node before starting DCGM to fix this.

dcgmReturn_t dcgmGetPidInfo(dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPidInfo_t *pidInfo)

Get information about all GPUs while the provided pid was running.

In order for this request to work, you must first call dcgmWatchPidFields() to make sure that DCGM is watching the appropriate field IDs that will be populated in pidInfo

Parameters
  • pDcgmHandle – IN: DCGM Handle

  • groupId – IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.

  • pidInfo – IN/OUT: Structure to return information about pid in. pidInfo->pid must be set to the pid in question. pidInfo->version should be set to dcgmPidInfo_version.

Returns