1.2.4. Status handling

[System]

The following APIs are used to manage statuses for multiple operations on one or more GPUs.

Functions

dcgmReturn_t dcgmStatusClear ( dcgmStatus_t statusHandle )
dcgmReturn_t dcgmStatusCreate ( dcgmStatus_t* statusHandle )
dcgmReturn_t dcgmStatusDestroy ( dcgmStatus_t statusHandle )
dcgmReturn_t dcgmStatusGetCount ( dcgmStatus_t statusHandle, unsigned int* count )
dcgmReturn_t dcgmStatusPopError ( dcgmStatus_t statusHandle, dcgmErrorInfo_t* pDcgmErrorInfo )

Functions

dcgmReturn_t dcgmStatusClear ( dcgmStatus_t statusHandle )
Parameters
statusHandle
IN: Handle to list of statuses
Returns

Description

Used to clear all the errors in the status handle created by the API dcgmStatusCreate. After one set of operation, the statusHandle can be cleared and reused for the next set of operation.

dcgmReturn_t dcgmStatusCreate ( dcgmStatus_t* statusHandle )
Parameters
statusHandle
OUT: Reference to handle for list of statuses
Returns

Description

Creates reference to DCGM status handler which can be used to get the statuses for multiple operations on one or more devices.

The multiple statuses are useful when the operations are performed at group level. The status handle provides a mechanism to access error attributes for the failed operations.

The number of errors stored behind the opaque handle can be accessed using the the API dcgmStatusGetCount. The errors are accessed from the opaque handle statusHandle using the API dcgmStatusPopError. The user can invoke dcgmStatusPopError for the number of errors or until all the errors are fetched.

When the status handle is not required any further then it should be deleted using the API dcgmStatusDestroy.

dcgmReturn_t dcgmStatusDestroy ( dcgmStatus_t statusHandle )
Parameters
statusHandle
IN: Handle to list of statuses
Returns

Description

Used to destroy status handle created using dcgmStatusCreate.

dcgmReturn_t dcgmStatusGetCount ( dcgmStatus_t statusHandle, unsigned int* count )
Parameters
statusHandle
IN: Handle to list of statuses
count
OUT: Number of error entries present in the list of statuses
Returns

Description

Used to get count of error entries stored inside the opaque handle statusHandle.

dcgmReturn_t dcgmStatusPopError ( dcgmStatus_t statusHandle, dcgmErrorInfo_t* pDcgmErrorInfo )
Parameters
statusHandle
IN: Handle to list of statuses
pDcgmErrorInfo
OUT: First error from the list of statuses
Returns

Description

Used to iterate through the list of errors maintained behind statusHandle. The method pops the first error from the list of DCGM statuses. In order to iterate through all the errors, the user can invoke this API for the number of errors or until all the errors are fetched.