Field Constants¶
- group dcgmFieldConstants
Constants that represent contents of individual field values.
Defines
-
DCGM_CUDA_COMPUTE_CAPABILITY_MAJOR(x) ((uint64_t)(x)&0xFFFF0000)¶
DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY is 16 bits of major version followed by 16 bits of the minor version.
These macros separate the two.
-
DCGM_CUDA_COMPUTE_CAPABILITY_MINOR(x) ((uint64_t)(x)&0x0000FFFF)¶
-
DCGM_CLOCKS_THROTTLE_REASON_GPU_IDLE 0x0000000000000001LL¶
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS is a bitmap of why the clock is throttled.
These macros are masks for relevant throttling, and are a 1:1 map to the NVML reasons documented in nvml.h. The notes for the header are copied blow: Nothing is running on the GPU and the clocks are dropping to Idle state
Note
This limiter may be removed in a later release
-
DCGM_CLOCKS_THROTTLE_REASON_CLOCKS_SETTING 0x0000000000000002LL¶
GPU clocks are limited by current setting of applications clocks.
-
DCGM_CLOCKS_THROTTLE_REASON_SW_POWER_CAP 0x0000000000000004LL¶
SW Power Scaling algorithm is reducing the clocks below requested clocks.
-
DCGM_CLOCKS_THROTTLE_REASON_HW_SLOWDOWN 0x0000000000000008LL¶
HW Slowdown (reducing the core clocks by a factor of 2 or more) is engaged.
This is an indicator of:
temperature being too high
External Power Brake Assertion is triggered (e.g. by the system power supply)
Power draw is too high and Fast Trigger protection is reducing the clocks
May be also reported during PState or clock change
This behavior may be removed in a later release.
-
DCGM_CLOCKS_THROTTLE_REASON_SYNC_BOOST 0x0000000000000010LL¶
Sync Boost.
This GPU has been added to a Sync boost group with nvidia-smi or DCGM in order to maximize performance per watt. All GPUs in the sync boost group will boost to the minimum possible clocks across the entire group. Look at the throttle reasons for other GPUs in the system to see why those GPUs are holding this one at lower clocks.
-
DCGM_CLOCKS_THROTTLE_REASON_SW_THERMAL 0x0000000000000020LL¶
SW Thermal Slowdown.
This is an indicator of one or more of the following:
Current GPU temperature above the GPU Max Operating Temperature
Current memory temperature above the Memory Max Operating Temperature
-
DCGM_CLOCKS_THROTTLE_REASON_HW_THERMAL 0x0000000000000040LL¶
HW Thermal Slowdown (reducing the core clocks by a factor of 2 or more) is engaged.
This is an indicator of:
temperature being too high
-
DCGM_CLOCKS_THROTTLE_REASON_HW_POWER_BRAKE 0x0000000000000080LL¶
HW Power Brake Slowdown (reducing the core clocks by a factor of 2 or more) is engaged.
This is an indicator of:
External Power Brake Assertion being triggered (e.g. by the system power supply)
-
DCGM_CLOCKS_THROTTLE_REASON_DISPLAY_CLOCKS 0x0000000000000100LL¶
GPU clocks are limited by current setting of Display clocks.
Enums
-
enum dcgmGpuVirtualizationMode_t¶
GPU virtualization mode types for DCGM_FI_DEV_VIRTUAL_MODE.
Values:
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_NONE¶
Represents Bare Metal GPU.
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_PASSTHROUGH¶
Device is associated with GPU-Passthrough.
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_VGPU¶
Device is associated with vGPU inside virtual machine.
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_HOST_VGPU¶
Device is associated with VGX hypervisor in vGPU mode.
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_HOST_VSGA¶
Device is associated with VGX hypervisor in vSGA mode.
-
enumerator DCGM_GPU_VIRTUALIZATION_MODE_NONE¶
-
DCGM_CUDA_COMPUTE_CAPABILITY_MAJOR(x) ((uint64_t)(x)&0xFFFF0000)¶