1.2.2. Grouping
[System]
The following APIs are used for group management. The user can create a group of entities and perform an operation on a group of entities. If grouping is not needed and the user wishes to run commands on all GPUs seen by DCGM then the user can use DCGM_GROUP_ALL_GPUS or DCGM_GROUP_ALL_NVSWITCHES in place of group IDs when needed.
Functions
- dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int gpuId )
- dcgmReturn_t dcgmGroupAddEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
- dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
- dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
- dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
- dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
- dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int gpuId )
- dcgmReturn_t dcgmGroupRemoveEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
Functions
- dcgmReturn_t dcgmGroupAddDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int gpuId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group Id to which device should be added
- gpuId
- IN: DCGM GPU Id
Returns
- DCGM_ST_OK if the GPU Id has been successfully added to the group
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group (groupId) does not exists
- DCGM_ST_BADPARAM if gpuId is invalid or already part of the specified group
Description
Used to add specified GPU Id to the group represented by groupId.
- dcgmReturn_t dcgmGroupAddEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group Id to which device should be added
- entityGroupId
- IN: Entity group that entityId belongs to
- entityId
- IN: DCGM entityId
Returns
- DCGM_ST_OK if the entity has been successfully added to the group
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group (groupId) does not exists
- DCGM_ST_BADPARAM if entityId is invalid or already part of the specified group
Description
Used to add specified entity to the group represented by groupId.
- dcgmReturn_t dcgmGroupCreate ( dcgmHandle_t pDcgmHandle, dcgmGroupType_t type, char* groupName, dcgmGpuGrp_t* pDcgmGrpId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- type
- IN: Type of Entity Group to be formed
- groupName
- IN: Desired name of the GPU group specified as NULL terminated C string
- pDcgmGrpId
- OUT: Reference to group ID
Returns
- DCGM_ST_OK if the group has been created
- DCGM_ST_BADPARAM if any of type, groupName, length or pDcgmGrpId is invalid
- DCGM_ST_MAX_LIMIT if number of groups on the system has reached the max limit DCGM_MAX_NUM_GROUPS
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
Description
Used to create a entity group handle which can store one or more entity Ids as an opaque handle returned in pDcgmGrpId. Instead of executing an operation separately for each entity, the DCGM group enables the user to execute same operation on all the entities present in the group as a single API call.
To create the group with all the entities present on the system, the type field should be specified as DCGM_GROUP_DEFAULT or DCGM_GROUP_ALL_NVSWITCHES. To create an empty group, the type field should be specified as DCGM_GROUP_EMPTY. The empty group can be updated with the desired set of entities using the APIs dcgmGroupAddDevice, dcgmGroupAddEntity, dcgmGroupRemoveDevice, and dcgmGroupRemoveEntity.
- dcgmReturn_t dcgmGroupDestroy ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group ID
Returns
- DCGM_ST_OK if the group has been destroyed
- DCGM_ST_BADPARAM if groupId is invalid
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group does not exists
Description
Used to destroy a group represented by groupId. Since DCGM group is a logical grouping of entities, the properties applied on the group stay intact for the individual entities even after the group is destroyed.
- dcgmReturn_t dcgmGroupGetAllIds ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupIdList[], unsigned int* count )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupIdList
- OUT: List of Group Ids
- count
- OUT: The number of Group ids in the list
Returns
- DCGM_ST_OK if the ids of the groups were successfully retrieved
- DCGM_ST_BADPARAM if either of the groupIdList or count is null
- DCGM_ST_GENERIC_ERROR if an unknown error has occurred
Description
Used to get the Ids of all groups of entities. The information returned is a list of group ids in groupIdList as well as a count of how many ids there are in count. Please allocate enough memory for groupIdList. Memory of size MAX_NUM_GROUPS should be allocated for groupIdList.
- dcgmReturn_t dcgmGroupGetInfo ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmGroupInfo_t* pDcgmGroupInfo )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group ID for which information to be fetched
- pDcgmGroupInfo
- OUT: Group Information
Returns
- DCGM_ST_OK if the group info is successfully received.
- DCGM_ST_BADPARAM if any of groupId or pDcgmGroupInfo is invalid.
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized.
- DCGM_ST_MAX_LIMIT if the group does not contain the GPU
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group (groupId) does not exists
Description
Used to get information corresponding to the group represented by groupId. The information returned in pDcgmGroupInfo consists of group name, and the list of entities present in the group.
- dcgmReturn_t dcgmGroupRemoveDevice ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, unsigned int gpuId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group ID from which device should be removed
- gpuId
- IN: DCGM GPU Id
Returns
- DCGM_ST_OK if the GPU Id has been successfully removed from the group
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group (groupId) does not exists
- DCGM_ST_BADPARAM if gpuId is invalid or not part of the specified group
Description
Used to remove specified GPU Id from the group represented by groupId.
- dcgmReturn_t dcgmGroupRemoveEntity ( dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgm_field_entity_group_t entityGroupId, dcgm_field_eid_t entityId )
-
Parameters
- pDcgmHandle
- IN: DCGM Handle
- groupId
- IN: Group ID from which device should be removed
- entityGroupId
- IN: Entity group that entityId belongs to
- entityId
- IN: DCGM entityId
Returns
- DCGM_ST_OK if the entity has been successfully removed from the group
- DCGM_ST_INIT_ERROR if the library has not been successfully initialized
- DCGM_ST_NOT_CONFIGURED if entry corresponding to the group (groupId) does not exists
- DCGM_ST_BADPARAM if entityId is invalid or not part of the specified group
Description
Used to remove specified entity from the group represented by groupId.