2.31. dcgmGpuUsageInfo_t Struct Reference

[Structure definitions]

Info corresponding to the job on a GPU

Public Variables

long long  boardLimitViolationTime
struct dcgmProcessUtilInfo_t computePidInfo[DCGM_MAX_PID_INFO_NUM]
unsigned int  eccDoubleBit
Count of ECC double bit errors that occurred.
unsigned int  eccSingleBit
Deprecated - Count of ECC single bit errors that occurred.
long long  endTime
User provided job end time in microseconds since 1970.
long long  energyConsumed
Energy consumed in milli-watt/seconds.
unsigned int  gpuId
ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
struct dcgmProcessUtilInfo_t graphicsPidInfo[DCGM_MAX_PID_INFO_NUM]
dcgmHealthWatchResults_t health
health of the specified system on this GPU
long long  lowUtilizationTime
Amount of microseconds we were at reduced clocks due to low utilization.
long long  maxGpuMemoryUsed
Maximum amount of GPU memory that was used in bytes.
struct dcgmStatSummaryInt32_t memoryClock
Memory clock in MHz.
struct dcgmStatSummaryInt32_t memoryUtilization
GPU Memory Utilization in percent.
int  numComputePids
Count of computePids entries that are valid.
int  numGraphicsPids
Count of graphicsPids entries that are valid.
int  numXidCriticalErrors
Number of valid entries in xidCriticalErrorsTs.
dcgmHealthWatchResults_t overallHealth
The overall health of the system. dcgmHealthWatchResults_t.
long long  pcieReplays
Count of PCI-E replays that occurred.
struct dcgmStatSummaryInt64_t pcieRxBandwidth
PCI-E bytes read from the GPU.
struct dcgmStatSummaryInt64_t pcieTxBandwidth
PCI-E bytes written to the GPU.
struct dcgmStatSummaryFp64_t powerUsage
Power usage Min/Max/Avg in watts.
long long  powerViolationTime
Number of microseconds we were at reduced clocks due to power violation.
long long  reliabilityViolationTime
struct dcgmStatSummaryInt32_t smClock
SM clock in MHz.
struct dcgmStatSummaryInt32_t smUtilization
GPU SM Utilization in percent.
long long  startTime
User provided job start time in microseconds since 1970.
long long  syncBoostTime
Amount of microseconds we were at reduced clocks due to sync boost.
dcgmHealthSystems_t system
system to which this information belongs
long long  thermalViolationTime
Number of microseconds we were at reduced clocks due to thermal violation.
long long  xidCriticalErrorsTs[10]
Timestamps of the critical XID errors that occurred.

Variables

long long dcgmGpuUsageInfo_t::boardLimitViolationTime [inherited]

Amount of microseconds we were at reduced clocks due to being at the board's max voltage

struct dcgmProcessUtilInfo_tdcgmGpuUsageInfo_t::computePidInfo[DCGM_MAX_PID_INFO_NUM] [inherited]

List of compute processes that ran during the job 0=no process

unsigned int dcgmGpuUsageInfo_t::eccDoubleBit [inherited]

Count of ECC double bit errors that occurred.

unsigned int dcgmGpuUsageInfo_t::eccSingleBit [inherited]

Deprecated - Count of ECC single bit errors that occurred.

long long dcgmGpuUsageInfo_t::endTime [inherited]

User provided job end time in microseconds since 1970.

long long dcgmGpuUsageInfo_t::energyConsumed [inherited]

Energy consumed in milli-watt/seconds.

unsigned int dcgmGpuUsageInfo_t::gpuId [inherited]

ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.

struct dcgmProcessUtilInfo_tdcgmGpuUsageInfo_t::graphicsPidInfo[DCGM_MAX_PID_INFO_NUM] [inherited]

List of compute processes that ran during the job 0=no process

dcgmHealthWatchResults_tdcgmGpuUsageInfo_t::health [inherited]

health of the specified system on this GPU

long long dcgmGpuUsageInfo_t::lowUtilizationTime [inherited]

Amount of microseconds we were at reduced clocks due to low utilization.

long long dcgmGpuUsageInfo_t::maxGpuMemoryUsed [inherited]

Maximum amount of GPU memory that was used in bytes.

struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::memoryClock [inherited]

Memory clock in MHz.

struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::memoryUtilization [inherited]

GPU Memory Utilization in percent.

int dcgmGpuUsageInfo_t::numComputePids [inherited]

Count of computePids entries that are valid.

int dcgmGpuUsageInfo_t::numGraphicsPids [inherited]

Count of graphicsPids entries that are valid.

int dcgmGpuUsageInfo_t::numXidCriticalErrors [inherited]

Number of valid entries in xidCriticalErrorsTs.

dcgmHealthWatchResults_tdcgmGpuUsageInfo_t::overallHealth [inherited]

The overall health of the system. dcgmHealthWatchResults_t.

long long dcgmGpuUsageInfo_t::pcieReplays [inherited]

Count of PCI-E replays that occurred.

struct dcgmStatSummaryInt64_tdcgmGpuUsageInfo_t::pcieRxBandwidth [inherited]

PCI-E bytes read from the GPU.

struct dcgmStatSummaryInt64_tdcgmGpuUsageInfo_t::pcieTxBandwidth [inherited]

PCI-E bytes written to the GPU.

struct dcgmStatSummaryFp64_tdcgmGpuUsageInfo_t::powerUsage [inherited]

Power usage Min/Max/Avg in watts.

long long dcgmGpuUsageInfo_t::powerViolationTime [inherited]

Number of microseconds we were at reduced clocks due to power violation.

long long dcgmGpuUsageInfo_t::reliabilityViolationTime [inherited]

Amount of microseconds we were at reduced clocks due to the reliability limit

struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::smClock [inherited]

SM clock in MHz.

struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::smUtilization [inherited]

GPU SM Utilization in percent.

long long dcgmGpuUsageInfo_t::startTime [inherited]

User provided job start time in microseconds since 1970.

long long dcgmGpuUsageInfo_t::syncBoostTime [inherited]

Amount of microseconds we were at reduced clocks due to sync boost.

dcgmHealthSystems_tdcgmGpuUsageInfo_t::system [inherited]

system to which this information belongs

long long dcgmGpuUsageInfo_t::thermalViolationTime [inherited]

Number of microseconds we were at reduced clocks due to thermal violation.

long long dcgmGpuUsageInfo_t::xidCriticalErrorsTs[10] [inherited]

Timestamps of the critical XID errors that occurred.