1.5. Process Statistics

Describes APIs to investigate statistics such as accounting, performance and errors during the lifetime of a GPU process

Functions

dcgmReturn_t dcgmGetPidInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPidInfo_t* pidInfo )
dcgmReturn_t dcgmWatchPidFields ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, long long updateFreq, double  maxKeepAge, int  maxKeepSamples )

Functions

dcgmReturn_t dcgmGetPidInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPidInfo_t* pidInfo )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
pidInfo
IN/OUT: Structure to return information about pid in. pidInfo->pid must be set to the pid in question. pidInfo->version should be set to dcgmPidInfo_version.
Returns

Description

Get information about all GPUs while the provided pid was running

In order for this request to work, you must first call dcgmWatchPidFields() to make sure that DCGM is watching the appropriate field IDs that will be populated in pidInfo

dcgmReturn_t dcgmWatchPidFields ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, long long updateFreq, double  maxKeepAge, int  maxKeepSamples )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID representing collection of one or more GPUs. Look at dcgmGroupCreate for details on creating the group. Alternatively, pass in the group id as DCGM_GROUP_ALL_GPUS to perform operation on all the GPUs.
updateFreq
IN: How often to update this field in usec
maxKeepAge
IN: How long to keep data for this field in seconds
maxKeepSamples
IN: Maximum number of samples to keep. 0=no limit
Returns

  • DCGM_ST_OK if the call was successful
  • DCGM_ST_BADPARAM if a parameter is invalid
  • DCGM_ST_REQUIRES_ROOT if the host engine is being run as non-root, and accounting mode could not be enabled (requires root). Run "nvidia-smi -am 1" as root on the node before starting DCGM to fix this.

Description

Request that DCGM start recording stats for fields that can be queried with dcgmGetPidInfo().

Note that the first update of the field will not occur until the next field update cycle. To force a field update cycle, call dcgmUpdateAllFields(1).