2.53. dcgmPidSingleInfo_t Struct Reference
[Structure definitions]
Info corresponding to single PID
Public Variables
- long long boardLimitViolationTime
- unsigned int eccDoubleBit
- Count of ECC double bit errors that occurred.
- unsigned int eccSingleBit
- Deprecated - Count of ECC single bit errors that occurred.
- long long endTime
- Process end time in microseconds since 1970 or reported as 0 if the process is not completed.
- long long energyConsumed
- Energy consumed by the gpu in milli-watt/seconds.
- unsigned int gpuId
- ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
- dcgmHealthWatchResults_t health
- health of the specified system on this GPU
- long long lowUtilizationTime
- Amount of microseconds we were at reduced clocks due to low utilization.
- long long maxGpuMemoryUsed
- Maximum amount of GPU memory that was used in bytes.
- struct dcgmStatSummaryInt32_t memoryClock
- Memory clock in MHz.
- struct dcgmStatSummaryInt32_t memoryUtilization
- GPU Memory Utilization in percent.
- int numOtherComputePids
- Count of otherComputePids entries that are valid.
- int numOtherGraphicsPids
- Count of otherGraphicsPids entries that are valid.
- int numXidCriticalErrors
- Number of valid entries in xidCriticalErrorsTs.
- unsigned int otherComputePids[DCGM_MAX_PID_INFO_NUM]
- Other compute processes that ran. 0=no process.
- unsigned int otherGraphicsPids[DCGM_MAX_PID_INFO_NUM]
- Other graphics processes that ran. 0=no process.
- dcgmHealthWatchResults_t overallHealth
- The overall health of the system. dcgmHealthWatchResults_t.
- long long pcieReplays
- Count of PCI-E replays that occurred.
- struct dcgmStatSummaryInt64_t pcieRxBandwidth
- PCI-E bytes read from the GPU.
- struct dcgmStatSummaryInt64_t pcieTxBandwidth
- PCI-E bytes written to the GPU.
- long long powerViolationTime
- Number of microseconds we were at reduced clocks due to power violation.
- struct dcgmProcessUtilInfo_t processUtilization
- Process SM and Memory Utilization (in percent).
- long long reliabilityViolationTime
- struct dcgmStatSummaryInt32_t smClock
- SM clock in MHz.
- struct dcgmStatSummaryInt32_t smUtilization
- GPU SM Utilization in percent.
- long long startTime
- Process start time in microseconds since 1970.
- long long syncBoostTime
- Amount of microseconds we were at reduced clocks due to sync boost.
- dcgmHealthSystems_t system
- system to which this information belongs
- long long thermalViolationTime
- Number of microseconds we were at reduced clocks due to thermal violation.
- long long xidCriticalErrorsTs[10]
- Timestamps of the critical XID errors that occurred.
Variables
- long long dcgmPidSingleInfo_t::boardLimitViolationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to being at the board's max voltage
- unsigned int dcgmPidSingleInfo_t::eccDoubleBit [inherited]
-
Count of ECC double bit errors that occurred.
- unsigned int dcgmPidSingleInfo_t::eccSingleBit [inherited]
-
Deprecated - Count of ECC single bit errors that occurred.
- long long dcgmPidSingleInfo_t::endTime [inherited]
-
Process end time in microseconds since 1970 or reported as 0 if the process is not completed.
- long long dcgmPidSingleInfo_t::energyConsumed [inherited]
-
Energy consumed by the gpu in milli-watt/seconds.
- unsigned int dcgmPidSingleInfo_t::gpuId [inherited]
-
ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
- dcgmHealthWatchResults_tdcgmPidSingleInfo_t::health [inherited]
-
health of the specified system on this GPU
- long long dcgmPidSingleInfo_t::lowUtilizationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to low utilization.
- long long dcgmPidSingleInfo_t::maxGpuMemoryUsed [inherited]
-
Maximum amount of GPU memory that was used in bytes.
- struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::memoryClock [inherited]
-
Memory clock in MHz.
- struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::memoryUtilization [inherited]
-
GPU Memory Utilization in percent.
- int dcgmPidSingleInfo_t::numOtherComputePids [inherited]
-
Count of otherComputePids entries that are valid.
- int dcgmPidSingleInfo_t::numOtherGraphicsPids [inherited]
-
Count of otherGraphicsPids entries that are valid.
- int dcgmPidSingleInfo_t::numXidCriticalErrors [inherited]
-
Number of valid entries in xidCriticalErrorsTs.
- unsigned int dcgmPidSingleInfo_t::otherComputePids[DCGM_MAX_PID_INFO_NUM] [inherited]
-
Other compute processes that ran. 0=no process.
- unsigned int dcgmPidSingleInfo_t::otherGraphicsPids[DCGM_MAX_PID_INFO_NUM] [inherited]
-
Other graphics processes that ran. 0=no process.
- dcgmHealthWatchResults_tdcgmPidSingleInfo_t::overallHealth [inherited]
-
The overall health of the system. dcgmHealthWatchResults_t.
- long long dcgmPidSingleInfo_t::pcieReplays [inherited]
-
Count of PCI-E replays that occurred.
- struct dcgmStatSummaryInt64_tdcgmPidSingleInfo_t::pcieRxBandwidth [inherited]
-
PCI-E bytes read from the GPU.
- struct dcgmStatSummaryInt64_tdcgmPidSingleInfo_t::pcieTxBandwidth [inherited]
-
PCI-E bytes written to the GPU.
- long long dcgmPidSingleInfo_t::powerViolationTime [inherited]
-
Number of microseconds we were at reduced clocks due to power violation.
- struct dcgmProcessUtilInfo_tdcgmPidSingleInfo_t::processUtilization [inherited]
-
Process SM and Memory Utilization (in percent).
- long long dcgmPidSingleInfo_t::reliabilityViolationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to the reliability limit
- struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::smClock [inherited]
-
SM clock in MHz.
- struct dcgmStatSummaryInt32_tdcgmPidSingleInfo_t::smUtilization [inherited]
-
GPU SM Utilization in percent.
- long long dcgmPidSingleInfo_t::startTime [inherited]
-
Process start time in microseconds since 1970.
- long long dcgmPidSingleInfo_t::syncBoostTime [inherited]
-
Amount of microseconds we were at reduced clocks due to sync boost.
- dcgmHealthSystems_tdcgmPidSingleInfo_t::system [inherited]
-
system to which this information belongs
- long long dcgmPidSingleInfo_t::thermalViolationTime [inherited]
-
Number of microseconds we were at reduced clocks due to thermal violation.
- long long dcgmPidSingleInfo_t::xidCriticalErrorsTs[10] [inherited]
-
Timestamps of the critical XID errors that occurred.