2.53. dcgmPidSingleInfo_t Struct Reference

[Structure definitions]

Info corresponding to single PID

Public Variables

long long  boardLimitViolationTime
unsigned int  eccDoubleBit
Count of ECC double bit errors that occurred.
unsigned int  eccSingleBit
Deprecated - Count of ECC single bit errors that occurred.
long long  endTime
Process end time in microseconds since 1970 or reported as 0 if the process is not completed.
long long  energyConsumed
Energy consumed by the gpu in milli-watt/seconds.
unsigned int  gpuId
ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
dcgmHealthWatchResults_t health
health of the specified system on this GPU
long long  lowUtilizationTime
Amount of microseconds we were at reduced clocks due to low utilization.
long long  maxGpuMemoryUsed
Maximum amount of GPU memory that was used in bytes.
struct dcgmStatSummaryInt32_t memoryClock
Memory clock in MHz.
struct dcgmStatSummaryInt32_t memoryUtilization
GPU Memory Utilization in percent.
int  numOtherComputePids
Count of otherComputePids entries that are valid.
int  numOtherGraphicsPids
Count of otherGraphicsPids entries that are valid.
int  numXidCriticalErrors
Number of valid entries in xidCriticalErrorsTs.
unsigned int  otherComputePids[DCGM_MAX_PID_INFO_NUM]
Other compute processes that ran. 0=no process.
unsigned int  otherGraphicsPids[DCGM_MAX_PID_INFO_NUM]
Other graphics processes that ran. 0=no process.
dcgmHealthWatchResults_t overallHealth
The overall health of the system. dcgmHealthWatchResults_t.
long long  pcieReplays
Count of PCI-E replays that occurred.
struct dcgmStatSummaryInt64_t pcieRxBandwidth
PCI-E bytes read from the GPU.
struct dcgmStatSummaryInt64_t pcieTxBandwidth
PCI-E bytes written to the GPU.
long long  powerViolationTime
Number of microseconds we were at reduced clocks due to power violation.
struct dcgmProcessUtilInfo_t processUtilization
Process SM and Memory Utilization (in percent).
long long  reliabilityViolationTime
struct dcgmStatSummaryInt32_t smClock
SM clock in MHz.
struct dcgmStatSummaryInt32_t smUtilization
GPU SM Utilization in percent.
long long  startTime
Process start time in microseconds since 1970.
long long  syncBoostTime
Amount of microseconds we were at reduced clocks due to sync boost.
dcgmHealthSystems_t system
system to which this information belongs
long long  thermalViolationTime
Number of microseconds we were at reduced clocks due to thermal violation.
long long  xidCriticalErrorsTs[10]
Timestamps of the critical XID errors that occurred.

Variables

long long dcgmPidSingleInfo_t::boardLimitViolationTime [inherited]

Amount of microseconds we were at reduced clocks due to being at the board's max voltage

unsigned int dcgmPidSingleInfo_t::eccDoubleBit [inherited]

Count of ECC double bit errors that occurred.

unsigned int dcgmPidSingleInfo_t::eccSingleBit [inherited]

Deprecated - Count of ECC single bit errors that occurred.

long long dcgmPidSingleInfo_t::endTime [inherited]

Process end time in microseconds since 1970 or reported as 0 if the process is not completed.

long long dcgmPidSingleInfo_t::energyConsumed [inherited]

Energy consumed by the gpu in milli-watt/seconds.

unsigned int dcgmPidSingleInfo_t::gpuId [inherited]

ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.

dcgmHealthWatchResults_tdcgmPidSingleInfo_t::health [inherited]

health of the specified system on this GPU

long long dcgmPidSingleInfo_t::lowUtilizationTime [inherited]

Amount of microseconds we were at reduced clocks due to low utilization.

long long dcgmPidSingleInfo_t::maxGpuMemoryUsed [inherited]

Maximum amount of GPU memory that was used in bytes.

struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::memoryClock [inherited]

Memory clock in MHz.

struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::memoryUtilization [inherited]

GPU Memory Utilization in percent.

int dcgmPidSingleInfo_t::numOtherComputePids [inherited]

Count of otherComputePids entries that are valid.

int dcgmPidSingleInfo_t::numOtherGraphicsPids [inherited]

Count of otherGraphicsPids entries that are valid.

int dcgmPidSingleInfo_t::numXidCriticalErrors [inherited]

Number of valid entries in xidCriticalErrorsTs.

unsigned int dcgmPidSingleInfo_t::otherComputePids[DCGM_MAX_PID_INFO_NUM] [inherited]

Other compute processes that ran. 0=no process.

unsigned int dcgmPidSingleInfo_t::otherGraphicsPids[DCGM_MAX_PID_INFO_NUM] [inherited]

Other graphics processes that ran. 0=no process.

dcgmHealthWatchResults_tdcgmPidSingleInfo_t::overallHealth [inherited]

The overall health of the system. dcgmHealthWatchResults_t.

long long dcgmPidSingleInfo_t::pcieReplays [inherited]

Count of PCI-E replays that occurred.

struct dcgmStatSummaryInt64_tdcgmPidSingleInfo_t::pcieRxBandwidth [inherited]

PCI-E bytes read from the GPU.

struct dcgmStatSummaryInt64_tdcgmPidSingleInfo_t::pcieTxBandwidth [inherited]

PCI-E bytes written to the GPU.

long long dcgmPidSingleInfo_t::powerViolationTime [inherited]

Number of microseconds we were at reduced clocks due to power violation.

struct dcgmProcessUtilInfo_tdcgmPidSingleInfo_t::processUtilization [inherited]

Process SM and Memory Utilization (in percent).

long long dcgmPidSingleInfo_t::reliabilityViolationTime [inherited]

Amount of microseconds we were at reduced clocks due to the reliability limit

struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::smClock [inherited]

SM clock in MHz.

struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::smUtilization [inherited]

GPU SM Utilization in percent.

long long dcgmPidSingleInfo_t::startTime [inherited]

Process start time in microseconds since 1970.

long long dcgmPidSingleInfo_t::syncBoostTime [inherited]

Amount of microseconds we were at reduced clocks due to sync boost.

dcgmHealthSystems_tdcgmPidSingleInfo_t::system [inherited]

system to which this information belongs

long long dcgmPidSingleInfo_t::thermalViolationTime [inherited]

Number of microseconds we were at reduced clocks due to thermal violation.

long long dcgmPidSingleInfo_t::xidCriticalErrorsTs[10] [inherited]

Timestamps of the critical XID errors that occurred.