2.2.2. Grouping

[System]

The following APIs are used for group management. The user can create a group of GPUs and perform an operation on a group of GPUs. If grouping is not needed and the user wishes to run commands on all GPUs seen by DCGM then the user can use DCGM_GROUP_ALL_GPUS in place of group IDs when needed.

Functions

dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )

Functions

dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
Parameters
pDcgmHandle
IN : DCGM Handle
groupId
IN : Group Id to which device should be added
gpuId
IN : DCGM GPU Id
Returns

Description

Used to add specified GPU Id to the group represented by groupId.

dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
Parameters
pDcgmHandle
IN : DCGM Handle
type
IN : Type of GPU Group to be formed
groupName
IN : Desired name of the GPU group specified as NULL terminated C string
pDcgmGrpId
OUT : Reference to group ID
Returns

Description

Used to create a GPU group handle which can store one or more GPU Ids as an opaque handle returned in pDcgmGrpId. Instead of executing an operation separately for each GPU, the DCGM group enables the user to execute same operation on all the GPUs present in the group as a single API call.

To create the group with all the GPUs present on the system, the type field should be specified as DCGM_GROUP_DEFAULT. To create an empty group, the type field should be specified as DCGM_GROUP_EMPTY. The empty group can be updated with the desired set of GPUs using the APIs dcgmGroupAddDevice and dcgmGroupRemoveDevice.

dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
Parameters
pDcgmHandle
IN : DCGM Handle
groupId
IN : Group ID
Returns

Description

Used to destroy a group represented by groupId. Since DCGM group is a logical grouping of GPUs, the properties applied on the group stay intact for the individual GPUs even after the group is destroyed.

dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
Parameters
pDcgmHandle
IN : DCGM Handle
groupIdList
OUT : List of Group Ids
count
OUT : The number of Group ids in the list
Returns

Description

Used to get the Ids of all groups of GPUs. The information returned is a list of GPU group ids in groupIdList as well as a count of how many ids there are in count. Please allocate enough memory for groupIdList. Memory of size MAX_NUM_GROUPS should be allocated for groupIdList.

dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
Parameters
pDcgmHandle
IN : DCGM Handle
groupId
IN : Group ID for which information to be fetched
pDcgmGroupInfo
OUT : Group Information
Returns

Description

Used to get information corresponding to the group represented by groupId. The information returned in pDcgmGroupInfo consists of group name, and the list of GPU IDs present in the group.

dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
Parameters
pDcgmHandle
IN : DCGM Handle
groupId
IN : Group ID from which device should be removed
gpuId
IN : DCGM GPU Id
Returns

Description

Used to remove specified GPU Id from the group represented by groupId.