Administrative¶
- group DCGMAPI_Admin
This chapter describes the administration interfaces for DCGM.
It is the user’s responsibility to call dcgmInit() before calling any other methods, and dcgmShutdown() once DCGM is no longer being used. The APIs in Administrative module can be broken down into following categories:
Init and Shutdown¶
- group DCGMAPI_Admin_InitShut
Describes APIs to Initialize and Shutdown the DCGM Engine.
Functions
-
dcgmReturn_t dcgmInit(void)¶
This method is used to initialize DCGM within this process.
This must be called before dcgmStartEmbedded() or dcgmConnect()
- Returns
DCGM_ST_OK if DCGM has been properly initialized
DCGM_ST_INIT_ERROR if there was an error initializing the library
-
dcgmReturn_t dcgmShutdown(void)¶
This method is used to shut down DCGM.
Any embedded host engines or remote connections will automatically be shut down as well.
- Returns
DCGM_ST_OK if DCGM has been properly shut down
DCGM_ST_UNINITIALIZED if the library was not shut down properly
-
dcgmReturn_t dcgmStartEmbedded(dcgmOperationMode_t opMode, dcgmHandle_t *pDcgmHandle)¶
Start an embedded host engine agent within this process.
The agent is loaded as a shared library. This mode is provided to avoid any extra jitter associated with an additional autonomous agent needs to be managed. In this mode, the user has to periodically call APIs such as dcgmPolicyTrigger and dcgmUpdateAllFields which tells DCGM to wake up and perform data collection and operations needed for policy management.
- Parameters
opMode – IN: Collect data automatically or manually when asked by the user.
pDcgmHandle – OUT: DCGM Handle to use for API calls
- Returns
DCGM_ST_OK if DCGM was started successfully within our process
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit yet
-
dcgmReturn_t dcgmStartEmbedded_v2(dcgmStartEmbeddedV2Params_v1 *params)¶
Start an embedded host engine agent within this process.
The agent is loaded as a shared library. This mode is provided to avoid any extra jitter associated with an additional autonomous agent needs to be managed. In this mode, the user has to periodically call APIs such as
dcgmPolicyTrigger
anddcgmUpdateAllFields
which tells DCGM to wake up and perform data collection and operations needed for policy management.See also
See also
Note
This function has a versioned argument that can be actually called with two different types. The behavior will depend on the params->version value.
- Parameters
params – [inout] A pointer to either
dcgmStartEmbeddedV2Params_v1
ordcgmStartEmbeddedV2Params_v2
.- Returns
DCGM_ST_OK
if DCGM was started successfully within our process- Returns
DCGM_ST_UNINITIALIZED
if DCGM has not been initialized withdcgmInit
yet
-
dcgmReturn_t dcgmStopEmbedded(dcgmHandle_t pDcgmHandle)¶
Stop the embedded host engine within this process that was started with dcgmStartEmbedded.
- Parameters
pDcgmHandle – IN : DCGM Handle of the embedded host engine that came from dcgmStartEmbedded
- Returns
DCGM_ST_OK if DCGM was stopped successfully within our process
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit or the embedded host engine was not running.
DCGM_ST_BADPARAM if an invalid parameter was provided
DCGM_ST_INIT_ERROR if an error occurred while trying to start the host engine.
-
dcgmReturn_t dcgmConnect(const char *ipAddress, dcgmHandle_t *pDcgmHandle)¶
This method is used to connect to a stand-alone host engine process.
Remote host engines are started by running the nv-hostengine command.
NOTE: dcgmConnect_v2 provides additional connection options.
- Parameters
ipAddress – IN: Valid IP address for the remote host engine to connect to. If ipAddress is specified as x.x.x.x it will attempt to connect to the default port specified by DCGM_HE_PORT_NUMBER If ipAddress is specified as x.x.x.x:yyyy it will attempt to connect to the port specified by yyyy
pDcgmHandle – OUT: DCGM Handle of the remote host engine
- Returns
DCGM_ST_OK if we successfully connected to the remote host engine
DCGM_ST_CONNECTION_NOT_VALID if the remote host engine could not be reached
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit.
DCGM_ST_BADPARAM if pDcgmHandle is NULL or ipAddress is invalid
DCGM_ST_INIT_ERROR if DCGM encountered an error while initializing the remote client library
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit
-
dcgmReturn_t dcgmConnect_v2(const char *ipAddress, dcgmConnectV2Params_t *connectParams, dcgmHandle_t *pDcgmHandle)¶
This method is used to connect to a stand-alone host engine process.
Remote host engines are started by running the nv-hostengine command.
- Parameters
ipAddress – IN: Valid IP address for the remote host engine to connect to. If ipAddress is specified as x.x.x.x it will attempt to connect to the default port specified by DCGM_HE_PORT_NUMBER. If ipAddress is specified as x.x.x.x:yyyy it will attempt to connect to the port specified by yyyy
connectParams – IN: Additional connection parameters. See dcgmConnectV2Params_t for details.
pDcgmHandle – OUT: DCGM Handle of the remote host engine
- Returns
DCGM_ST_OK if we successfully connected to the remote host engine
DCGM_ST_CONNECTION_NOT_VALID if the remote host engine could not be reached
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit.
DCGM_ST_BADPARAM if pDcgmHandle is NULL or ipAddress is invalid
DCGM_ST_INIT_ERROR if DCGM encountered an error while initializing the remote client library
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit
-
dcgmReturn_t dcgmDisconnect(dcgmHandle_t pDcgmHandle)¶
This method is used to disconnect from a stand-alone host engine process.
- Parameters
pDcgmHandle – IN: DCGM Handle that came from dcgmConnect
- Returns
DCGM_ST_OK if we successfully disconnected from the host engine
DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit
DCGM_ST_BADPARAM if pDcgmHandle is not a valid DCGM handle
DCGM_ST_GENERIC_ERROR if an unspecified internal error occurred
-
dcgmReturn_t dcgmInit(void)¶
Auxilary information about DCGM engine¶
- group DCGMAPI_Admin_Info
Describes APIs to get generic information about the DCGM Engine.
Functions
-
dcgmReturn_t dcgmVersionInfo(dcgmVersionInfo_t *pVersionInfo)¶
This method is used to return information about the build environment where DCGM was built.
- Parameters
pVersionInfo – OUT: Build environment information
- Returns
DCGM_ST_OK if build information is sucessfully obtained
DCGM_ST_BADPARAM if pVersionInfo is null
DCGM_ST_VER_MISMATCH if the expected and provided versions of dcgmVersionInfo_t do not match
-
dcgmReturn_t dcgmHostengineVersionInfo(dcgmHandle_t pDcgmHandle, dcgmVersionInfo_t *pVersionInfo)¶
This method is used to return information about the build environment of the hostengine.
- Parameters
pDcgmHandle – IN: DCGM Handle that came from dcgmConnect
pVersionInfo – OUT: Build environment information
- Returns
DCGM_ST_OK if build information is sucessfully obtained
DCGM_ST_BADPARAM if pVersionInfo is null
DCGM_ST_VER_MISMATCH if the expected and provided versions of dcgmVersionInfo_t do not match
-
dcgmReturn_t dcgmHostengineSetLoggingSeverity(dcgmHandle_t pDcgmHandle, dcgmSettingsSetLoggingSeverity_t *logging)¶
This method is used to set the logging severity on HostEngine for the specified logger.
- Parameters
pDcgmHandle – IN: DCGM Handle
logging – IN: dcgmSettingsSetLoggingSeverity_t struct containing the target logger and severity
- Returns
DCGM_ST_OK Severity successfuly set
DCGM_ST_BADPARAM Bad logger/severity string
DCGM_ST_VER_MISMATCH if the expected and provided versions of dcgmSettingsSetLoggingSeverity_t do not match
-
dcgmReturn_t dcgmHostengineIsHealthy(dcgmHandle_t pDcgmHandle, dcgmHostengineHealth_t *heHealth)¶
This function is used to return whether or not the host engine considers itself healthy.
- Parameters
pDcgmHandle – [in] - the handle to DCGM
heHealth – [out] - struct describing the health of the hostengine. if heHealth.hostengineHealth is 0, then the hostengine is healthy. Non-zero indicates not healthy with error codes determining the cause.
- Returns
DCGM_ST_OK Able to gauge health
DCGM_ST_BADPARAM isHealthy is not a valid pointer
-
const char *errorString(dcgmReturn_t result)¶
This function describes DCGM error codes in human readable form.
- Parameters
result – [in] - DCGM return code to describe
- Returns
Human readable string with the DCGM error code description if the code is valid.
nullptr if there is not such error code
-
dcgmReturn_t dcgmModuleIdToName(dcgmModuleId_t id, char const **name)¶
This function describes DCGM Module by given Module ID.
- Parameters
id[in] – - Module ID to name.
name[out] – - Module name will be provided via this argument.
- Returns
DCGM_ST_OK Module name has valid value
DCGM_ST_BADPARAM There is no module with specified ID. Name value is not changed.
-
dcgmReturn_t dcgmVersionInfo(dcgmVersionInfo_t *pVersionInfo)¶