1.2.2. Grouping

[System]

The following APIs are used for group management. The user can create a group of entities and perform an operation on a group of entities. If grouping is not needed and the user wishes to run commands on all GPUs seen by DCGM then the user can use DCGM_GROUP_ALL_GPUS or DCGM_GROUP_ALL_NVSWITCHES in place of group IDs when needed.

Functions

dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
dcgmReturn_t dcgmGroupAddEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
dcgmReturn_t dcgmGroupRemoveEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )

Functions

dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group Id to which device should be added
gpuId
IN: DCGM GPU Id
Returns

Description

Used to add specified GPU Id to the group represented by groupId.

dcgmReturn_t dcgmGroupAddEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group Id to which device should be added
entityGroupId
IN: Entity group that entityId belongs to
entityId
IN: DCGM entityId
Returns

Description

Used to add specified entity to the group represented by groupId.

dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
Parameters
pDcgmHandle
IN: DCGM Handle
type
IN: Type of Entity Group to be formed
groupName
IN: Desired name of the GPU group specified as NULL terminated C string
pDcgmGrpId
OUT: Reference to group ID
Returns

Description

Used to create a entity group handle which can store one or more entity Ids as an opaque handle returned in pDcgmGrpId. Instead of executing an operation separately for each entity, the DCGM group enables the user to execute same operation on all the entities present in the group as a single API call.

To create the group with all the entities present on the system, the type field should be specified as DCGM_GROUP_DEFAULT or DCGM_GROUP_ALL_NVSWITCHES. To create an empty group, the type field should be specified as DCGM_GROUP_EMPTY. The empty group can be updated with the desired set of entities using the APIs dcgmGroupAddDevice, dcgmGroupAddEntity, dcgmGroupRemoveDevice, and dcgmGroupRemoveEntity.

dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID
Returns

Description

Used to destroy a group represented by groupId. Since DCGM group is a logical grouping of entities, the properties applied on the group stay intact for the individual entities even after the group is destroyed.

dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
Parameters
pDcgmHandle
IN: DCGM Handle
groupIdList
OUT: List of Group Ids
count
OUT: The number of Group ids in the list
Returns

Description

Used to get the Ids of all groups of entities. The information returned is a list of group ids in groupIdList as well as a count of how many ids there are in count. Please allocate enough memory for groupIdList. Memory of size MAX_NUM_GROUPS should be allocated for groupIdList.

dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID for which information to be fetched
pDcgmGroupInfo
OUT: Group Information
Returns

Description

Used to get information corresponding to the group represented by groupId. The information returned in pDcgmGroupInfo consists of group name, and the list of entities present in the group.

dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int  gpuId )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID from which device should be removed
gpuId
IN: DCGM GPU Id
Returns

Description

Used to remove specified GPU Id from the group represented by groupId.

dcgmReturn_t dcgmGroupRemoveEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
Parameters
pDcgmHandle
IN: DCGM Handle
groupId
IN: Group ID from which device should be removed
entityGroupId
IN: Entity group that entityId belongs to
entityId
IN: DCGM entityId
Returns

Description

Used to remove specified entity from the group represented by groupId.