2.31. dcgmGpuUsageInfo_t Struct Reference
[Structure definitions]
Info corresponding to the job on a GPU
Public Variables
- long long boardLimitViolationTime
- struct dcgmProcessUtilInfo_t computePidInfo[DCGM_MAX_PID_INFO_NUM]
- unsigned int eccDoubleBit
- Count of ECC double bit errors that occurred.
- unsigned int eccSingleBit
- Deprecated - Count of ECC single bit errors that occurred.
- long long endTime
- User provided job end time in microseconds since 1970.
- long long energyConsumed
- Energy consumed in milli-watt/seconds.
- unsigned int gpuId
- ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
- struct dcgmProcessUtilInfo_t graphicsPidInfo[DCGM_MAX_PID_INFO_NUM]
- dcgmHealthWatchResults_t health
- health of the specified system on this GPU
- long long lowUtilizationTime
- Amount of microseconds we were at reduced clocks due to low utilization.
- long long maxGpuMemoryUsed
- Maximum amount of GPU memory that was used in bytes.
- struct dcgmStatSummaryInt32_t memoryClock
- Memory clock in MHz.
- struct dcgmStatSummaryInt32_t memoryUtilization
- GPU Memory Utilization in percent.
- int numComputePids
- Count of computePids entries that are valid.
- int numGraphicsPids
- Count of graphicsPids entries that are valid.
- int numXidCriticalErrors
- Number of valid entries in xidCriticalErrorsTs.
- dcgmHealthWatchResults_t overallHealth
- The overall health of the system. dcgmHealthWatchResults_t.
- long long pcieReplays
- Count of PCI-E replays that occurred.
- struct dcgmStatSummaryInt64_t pcieRxBandwidth
- PCI-E bytes read from the GPU.
- struct dcgmStatSummaryInt64_t pcieTxBandwidth
- PCI-E bytes written to the GPU.
- struct dcgmStatSummaryFp64_t powerUsage
- Power usage Min/Max/Avg in watts.
- long long powerViolationTime
- Number of microseconds we were at reduced clocks due to power violation.
- long long reliabilityViolationTime
- struct dcgmStatSummaryInt32_t smClock
- SM clock in MHz.
- struct dcgmStatSummaryInt32_t smUtilization
- GPU SM Utilization in percent.
- long long startTime
- User provided job start time in microseconds since 1970.
- long long syncBoostTime
- Amount of microseconds we were at reduced clocks due to sync boost.
- dcgmHealthSystems_t system
- system to which this information belongs
- long long thermalViolationTime
- Number of microseconds we were at reduced clocks due to thermal violation.
- long long xidCriticalErrorsTs[10]
- Timestamps of the critical XID errors that occurred.
Variables
- long long dcgmGpuUsageInfo_t::boardLimitViolationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to being at the board's max voltage
- struct dcgmProcessUtilInfo_tdcgmGpuUsageInfo_t::computePidInfo[DCGM_MAX_PID_INFO_NUM] [inherited]
-
List of compute processes that ran during the job 0=no process
- unsigned int dcgmGpuUsageInfo_t::eccDoubleBit [inherited]
-
Count of ECC double bit errors that occurred.
- unsigned int dcgmGpuUsageInfo_t::eccSingleBit [inherited]
-
Deprecated - Count of ECC single bit errors that occurred.
- long long dcgmGpuUsageInfo_t::endTime [inherited]
-
User provided job end time in microseconds since 1970.
- long long dcgmGpuUsageInfo_t::energyConsumed [inherited]
-
Energy consumed in milli-watt/seconds.
- unsigned int dcgmGpuUsageInfo_t::gpuId [inherited]
-
ID of the GPU this pertains to. GPU_ID_INVALID = summary information for multiple GPUs.
- struct dcgmProcessUtilInfo_tdcgmGpuUsageInfo_t::graphicsPidInfo[DCGM_MAX_PID_INFO_NUM] [inherited]
-
List of compute processes that ran during the job 0=no process
- dcgmHealthWatchResults_tdcgmGpuUsageInfo_t::health [inherited]
-
health of the specified system on this GPU
- long long dcgmGpuUsageInfo_t::lowUtilizationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to low utilization.
- long long dcgmGpuUsageInfo_t::maxGpuMemoryUsed [inherited]
-
Maximum amount of GPU memory that was used in bytes.
- struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::memoryClock [inherited]
-
Memory clock in MHz.
- struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::memoryUtilization [inherited]
-
GPU Memory Utilization in percent.
- int dcgmGpuUsageInfo_t::numComputePids [inherited]
-
Count of computePids entries that are valid.
- int dcgmGpuUsageInfo_t::numGraphicsPids [inherited]
-
Count of graphicsPids entries that are valid.
- int dcgmGpuUsageInfo_t::numXidCriticalErrors [inherited]
-
Number of valid entries in xidCriticalErrorsTs.
- dcgmHealthWatchResults_tdcgmGpuUsageInfo_t::overallHealth [inherited]
-
The overall health of the system. dcgmHealthWatchResults_t.
- long long dcgmGpuUsageInfo_t::pcieReplays [inherited]
-
Count of PCI-E replays that occurred.
- struct dcgmStatSummaryInt64_tdcgmGpuUsageInfo_t::pcieRxBandwidth [inherited]
-
PCI-E bytes read from the GPU.
- struct dcgmStatSummaryInt64_tdcgmGpuUsageInfo_t::pcieTxBandwidth [inherited]
-
PCI-E bytes written to the GPU.
- struct dcgmStatSummaryFp64_tdcgmGpuUsageInfo_t::powerUsage [inherited]
-
Power usage Min/Max/Avg in watts.
- long long dcgmGpuUsageInfo_t::powerViolationTime [inherited]
-
Number of microseconds we were at reduced clocks due to power violation.
- long long dcgmGpuUsageInfo_t::reliabilityViolationTime [inherited]
-
Amount of microseconds we were at reduced clocks due to the reliability limit
- struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::smClock [inherited]
-
SM clock in MHz.
- struct dcgmStatSummaryInt32_tdcgmGpuUsageInfo_t::smUtilization [inherited]
-
GPU SM Utilization in percent.
- long long dcgmGpuUsageInfo_t::startTime [inherited]
-
User provided job start time in microseconds since 1970.
- long long dcgmGpuUsageInfo_t::syncBoostTime [inherited]
-
Amount of microseconds we were at reduced clocks due to sync boost.
- dcgmHealthSystems_tdcgmGpuUsageInfo_t::system [inherited]
-
system to which this information belongs
- long long dcgmGpuUsageInfo_t::thermalViolationTime [inherited]
-
Number of microseconds we were at reduced clocks due to thermal violation.
- long long dcgmGpuUsageInfo_t::xidCriticalErrorsTs[10] [inherited]
-
Timestamps of the critical XID errors that occurred.