1.2.4. Status handling
[System]
The following APIs are used to manage statuses for multiple operations on one or more GPUs.
Functions
- dcgmReturn_t dcgmStatusClear ( dcgmStatus_t statusHandle )
- dcgmReturn_t dcgmStatusCreate ( dcgmStatus_t* statusHandle )
- dcgmReturn_t dcgmStatusDestroy ( dcgmStatus_t statusHandle )
- dcgmReturn_t dcgmStatusGetCount ( dcgmStatus_t statusHandle, unsigned int* count )
- dcgmReturn_t dcgmStatusPopError ( dcgmStatus_t statusHandle, dcgmErrorInfo_t* pDcgmErrorInfo )
Functions
- dcgmReturn_t dcgmStatusClear ( dcgmStatus_t statusHandle )
-
Parameters
- statusHandle
- IN: Handle to list of statuses
Returns
- DCGM_ST_OK if the errors are successfully cleared
- DCGM_ST_BADPARAM if statusHandle is invalid
Description
Used to clear all the errors in the status handle created by the API dcgmStatusCreate. After one set of operation, the statusHandle can be cleared and reused for the next set of operation.
- dcgmReturn_t dcgmStatusCreate ( dcgmStatus_t* statusHandle )
-
Parameters
- statusHandle
- OUT: Reference to handle for list of statuses
Returns
- DCGM_ST_OK if the status handle is successfully created
- DCGM_ST_BADPARAM if statusHandle is invalid
Description
Creates reference to DCGM status handler which can be used to get the statuses for multiple operations on one or more devices.
The multiple statuses are useful when the operations are performed at group level. The status handle provides a mechanism to access error attributes for the failed operations.
The number of errors stored behind the opaque handle can be accessed using the the API dcgmStatusGetCount. The errors are accessed from the opaque handle statusHandle using the API dcgmStatusPopError. The user can invoke dcgmStatusPopError for the number of errors or until all the errors are fetched.
When the status handle is not required any further then it should be deleted using the API dcgmStatusDestroy.
- dcgmReturn_t dcgmStatusDestroy ( dcgmStatus_t statusHandle )
-
Parameters
- statusHandle
- IN: Handle to list of statuses
Returns
- DCGM_ST_OK if the status handle is successfully created
- DCGM_ST_BADPARAM if statusHandle is invalid
Description
Used to destroy status handle created using dcgmStatusCreate.
- dcgmReturn_t dcgmStatusGetCount ( dcgmStatus_t statusHandle, unsigned int* count )
-
Parameters
- statusHandle
- IN: Handle to list of statuses
- count
- OUT: Number of error entries present in the list of statuses
Returns
- DCGM_ST_OK if the error count is successfully received
- DCGM_ST_BADPARAM if any of statusHandle or count is invalid
Description
Used to get count of error entries stored inside the opaque handle statusHandle.
- dcgmReturn_t dcgmStatusPopError ( dcgmStatus_t statusHandle, dcgmErrorInfo_t* pDcgmErrorInfo )
-
Parameters
- statusHandle
- IN: Handle to list of statuses
- pDcgmErrorInfo
- OUT: First error from the list of statuses
Returns
- DCGM_ST_OK if the error entry is successfully fetched
- DCGM_ST_BADPARAM if any of statusHandle or pDcgmErrorInfo is invalid
- DCGM_ST_NO_DATA if the status handle list is empty
Description
Used to iterate through the list of errors maintained behind statusHandle. The method pops the first error from the list of DCGM statuses. In order to iterate through all the errors, the user can invoke this API for the number of errors or until all the errors are fetched.