Field Identifiers¶
- group dcgmFieldIdentifiers
Field Identifiers.
Defines
-
DCGM_FI_UNKNOWN 0¶
NULL field.
-
DCGM_FI_DRIVER_VERSION 1¶
Driver Version.
-
DCGM_FI_NVML_VERSION 2¶
-
DCGM_FI_PROCESS_NAME 3¶
-
DCGM_FI_DEV_COUNT 4¶
Number of Devices on the node.
-
DCGM_FI_CUDA_DRIVER_VERSION 5¶
Cuda Driver Version Retrieves a number with the major value in the thousands place and the minor value in the hundreds place.
CUDA 11.1 = 11100
-
DCGM_FI_DEV_NAME 50¶
Name of the GPU device.
-
DCGM_FI_DEV_BRAND 51¶
Device Brand.
-
DCGM_FI_DEV_NVML_INDEX 52¶
NVML index of this GPU.
-
DCGM_FI_DEV_SERIAL 53¶
Device Serial Number.
-
DCGM_FI_DEV_UUID 54¶
UUID corresponding to the device.
-
DCGM_FI_DEV_MINOR_NUMBER 55¶
Device node minor number /dev/nvidia#.
-
DCGM_FI_DEV_OEM_INFOROM_VER 56¶
OEM inforom version.
-
DCGM_FI_DEV_PCI_BUSID 57¶
PCI attributes for the device.
-
DCGM_FI_DEV_PCI_COMBINED_ID 58¶
The combined 16-bit device id and 16-bit vendor id.
-
DCGM_FI_DEV_PCI_SUBSYS_ID 59¶
The 32-bit Sub System Device ID.
-
DCGM_FI_GPU_TOPOLOGY_PCI 60¶
Topology of all GPUs on the system via PCI (static)
-
DCGM_FI_GPU_TOPOLOGY_NVLINK 61¶
Topology of all GPUs on the system via NVLINK (static)
-
DCGM_FI_GPU_TOPOLOGY_AFFINITY 62¶
Affinity of all GPUs on the system (static)
-
DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY 63¶
Cuda compute capability for the device.
The major version is the upper 32 bits and the minor version is the lower 32 bits.
-
DCGM_FI_DEV_COMPUTE_MODE 65¶
Compute mode for the device.
-
DCGM_FI_DEV_PERSISTENCE_MODE 66¶
Persistence mode for the device Boolean: 0 is disabled, 1 is enabled.
-
DCGM_FI_DEV_MIG_MODE 67¶
MIG mode for the device Boolean: 0 is disabled, 1 is enabled.
-
DCGM_FI_DEV_CUDA_VISIBLE_DEVICES_STR 68¶
The string that CUDA_VISIBLE_DEVICES should be set to for this entity (including MIG)
-
DCGM_FI_DEV_MIG_MAX_SLICES 69¶
The maximum number of MIG slices supported by this GPU.
-
DCGM_FI_DEV_CPU_AFFINITY_0 70¶
Device CPU affinity.
part 1/8 = cpus 0 - 63
-
DCGM_FI_DEV_CPU_AFFINITY_1 71¶
Device CPU affinity.
part 1/8 = cpus 64 - 127
-
DCGM_FI_DEV_CPU_AFFINITY_2 72¶
Device CPU affinity.
part 2/8 = cpus 128 - 191
-
DCGM_FI_DEV_CPU_AFFINITY_3 73¶
Device CPU affinity.
part 3/8 = cpus 192 - 255
-
DCGM_FI_DEV_CC_MODE 74¶
ConfidentialCompute/AmpereProtectedMemory status for this system 0 = disabled 1 = enabled.
-
DCGM_FI_DEV_MIG_ATTRIBUTES 75¶
Attributes for the given MIG device handles.
-
DCGM_FI_DEV_MIG_GI_INFO 76¶
GPU instance profile information.
-
DCGM_FI_DEV_MIG_CI_INFO 77¶
Compute instance profile information.
-
DCGM_FI_DEV_ECC_INFOROM_VER 80¶
ECC inforom version.
-
DCGM_FI_DEV_POWER_INFOROM_VER 81¶
Power management object inforom version.
-
DCGM_FI_DEV_INFOROM_IMAGE_VER 82¶
Inforom image version.
-
DCGM_FI_DEV_INFOROM_CONFIG_CHECK 83¶
Inforom configuration checksum.
-
DCGM_FI_DEV_INFOROM_CONFIG_VALID 84¶
Reads the infoROM from the flash and verifies the checksums.
-
DCGM_FI_DEV_VBIOS_VERSION 85¶
VBIOS version of the device.
-
DCGM_FI_DEV_BAR1_TOTAL 90¶
Total BAR1 of the GPU in MB.
-
DCGM_FI_SYNC_BOOST 91¶
Deprecated - Sync boost settings on the node.
-
DCGM_FI_DEV_BAR1_USED 92¶
Used BAR1 of the GPU in MB.
-
DCGM_FI_DEV_BAR1_FREE 93¶
Free BAR1 of the GPU in MB.
-
DCGM_FI_DEV_SM_CLOCK 100¶
SM clock for the device.
-
DCGM_FI_DEV_MEM_CLOCK 101¶
Memory clock for the device.
-
DCGM_FI_DEV_VIDEO_CLOCK 102¶
Video encoder/decoder clock for the device.
-
DCGM_FI_DEV_APP_SM_CLOCK 110¶
SM Application clocks.
-
DCGM_FI_DEV_APP_MEM_CLOCK 111¶
Memory Application clocks.
-
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS 112¶
Current clock throttle reasons (bitmask of DCGM_CLOCKS_THROTTLE_REASON_*)
-
DCGM_FI_DEV_MAX_SM_CLOCK 113¶
Maximum supported SM clock for the device.
-
DCGM_FI_DEV_MAX_MEM_CLOCK 114¶
Maximum supported Memory clock for the device.
-
DCGM_FI_DEV_MAX_VIDEO_CLOCK 115¶
Maximum supported Video encoder/decoder clock for the device.
-
DCGM_FI_DEV_AUTOBOOST 120¶
Auto-boost for the device (1 = enabled.
0 = disabled)
-
DCGM_FI_DEV_SUPPORTED_CLOCKS 130¶
Supported clocks for the device.
-
DCGM_FI_DEV_MEMORY_TEMP 140¶
Memory temperature for the device.
-
DCGM_FI_DEV_GPU_TEMP 150¶
Current temperature readings for the device, in degrees C.
-
DCGM_FI_DEV_MEM_MAX_OP_TEMP 151¶
Maximum operating temperature for the memory of this GPU.
-
DCGM_FI_DEV_GPU_MAX_OP_TEMP 152¶
Maximum operating temperature for this GPU.
-
DCGM_FI_DEV_POWER_USAGE 155¶
Power usage for the device in Watts.
-
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION 156¶
Total energy consumption for the GPU in mJ since the driver was last reloaded.
-
DCGM_FI_DEV_SLOWDOWN_TEMP 158¶
Slowdown temperature for the device.
-
DCGM_FI_DEV_SHUTDOWN_TEMP 159¶
Shutdown temperature for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT 160¶
Current Power limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN 161¶
Minimum power management limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX 162¶
Maximum power management limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_DEF 163¶
Default power management limit for the device.
-
DCGM_FI_DEV_ENFORCED_POWER_LIMIT 164¶
Effective power limit that the driver enforces after taking into account all limiters.
-
DCGM_FI_DEV_PSTATE 190¶
Performance state (P-State) 0-15.
0=highest
-
DCGM_FI_DEV_FAN_SPEED 191¶
Fan speed for the device in percent 0-100.
-
DCGM_FI_DEV_PCIE_TX_THROUGHPUT 200¶
PCIe Tx utilization information.
Deprecated: Use DCGM_FI_PROF_PCIE_TX_BYTES instead.
-
DCGM_FI_DEV_PCIE_RX_THROUGHPUT 201¶
PCIe Rx utilization information.
Deprecated: Use DCGM_FI_PROF_PCIE_RX_BYTES instead.
-
DCGM_FI_DEV_PCIE_REPLAY_COUNTER 202¶
PCIe replay counter.
-
DCGM_FI_DEV_GPU_UTIL 203¶
GPU Utilization.
-
DCGM_FI_DEV_MEM_COPY_UTIL 204¶
Memory Utilization.
-
DCGM_FI_DEV_ACCOUNTING_DATA 205¶
Process accounting stats.
This field is only supported when the host engine is running as root unless you enable accounting ahead of time. Accounting mode can be enabled by running “nvidia-smi -am 1” as root on the same node the host engine is running on.
-
DCGM_FI_DEV_ENC_UTIL 206¶
Encoder Utilization.
-
DCGM_FI_DEV_DEC_UTIL 207¶
Decoder Utilization.
-
DCGM_FI_DEV_XID_ERRORS 230¶
XID errors.
The value is the specific XID error
-
DCGM_FI_DEV_PCIE_MAX_LINK_GEN 235¶
PCIe Max Link Generation.
-
DCGM_FI_DEV_PCIE_MAX_LINK_WIDTH 236¶
PCIe Max Link Width.
-
DCGM_FI_DEV_PCIE_LINK_GEN 237¶
PCIe Current Link Generation.
-
DCGM_FI_DEV_PCIE_LINK_WIDTH 238¶
PCIe Current Link Width.
-
DCGM_FI_DEV_POWER_VIOLATION 240¶
Power Violation time in usec.
-
DCGM_FI_DEV_THERMAL_VIOLATION 241¶
Thermal Violation time in usec.
-
DCGM_FI_DEV_SYNC_BOOST_VIOLATION 242¶
Sync Boost Violation time in usec.
-
DCGM_FI_DEV_BOARD_LIMIT_VIOLATION 243¶
Board violation limit.
-
DCGM_FI_DEV_LOW_UTIL_VIOLATION 244¶
Low utilisation violation limit.
-
DCGM_FI_DEV_RELIABILITY_VIOLATION 245¶
Reliability violation limit.
-
DCGM_FI_DEV_TOTAL_APP_CLOCKS_VIOLATION 246¶
App clock violation limit.
-
DCGM_FI_DEV_TOTAL_BASE_CLOCKS_VIOLATION 247¶
Base clock violation limit.
-
DCGM_FI_DEV_FB_TOTAL 250¶
Total Frame Buffer of the GPU in MB.
-
DCGM_FI_DEV_FB_FREE 251¶
Free Frame Buffer in MB.
-
DCGM_FI_DEV_FB_USED 252¶
Used Frame Buffer in MB.
-
DCGM_FI_DEV_FB_RESERVED 253¶
Reserved Frame Buffer in MB.
-
DCGM_FI_DEV_FB_USED_PERCENT 254¶
Percentage used of Frame Buffer: ‘Used/(Total - Reserved)’.
Range 0.0-1.0
-
DCGM_FI_DEV_ECC_CURRENT 300¶
Current ECC mode for the device.
-
DCGM_FI_DEV_ECC_PENDING 301¶
Pending ECC mode for the device.
-
DCGM_FI_DEV_ECC_SBE_VOL_TOTAL 310¶
Total single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_TOTAL 311¶
Total double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_TOTAL 312¶
Total single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_TOTAL 313¶
Total double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_VOL_L1 314¶
L1 cache single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_L1 315¶
L1 cache double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_L2 316¶
L2 cache single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_L2 317¶
L2 cache double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_DEV 318¶
Device memory single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_DEV 319¶
Device memory double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_REG 320¶
Register file single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_REG 321¶
Register file double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_TEX 322¶
Texture memory single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_TEX 323¶
Texture memory double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_L1 324¶
L1 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_L1 325¶
L1 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_L2 326¶
L2 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_L2 327¶
L2 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_DEV 328¶
Device memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_DEV 329¶
Device memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_REG 330¶
Register File single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_REG 331¶
Register File double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_TEX 332¶
Texture memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_TEX 333¶
Texture memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_RETIRED_SBE 390¶
Number of retired pages because of single bit errors Note: monotonically increasing.
-
DCGM_FI_DEV_RETIRED_DBE 391¶
Number of retired pages because of double bit errors Note: monotonically increasing.
-
DCGM_FI_DEV_RETIRED_PENDING 392¶
Number of pages pending retirement.
-
DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS 393¶
Number of remapped rows for uncorrectable errors.
-
DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS 394¶
Number of remapped rows for correctable errors.
-
DCGM_FI_DEV_ROW_REMAP_FAILURE 395¶
Whether remapping of rows has failed.
-
DCGM_FI_DEV_ROW_REMAP_PENDING 396¶
Whether remapping of rows is pending.
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L0 400¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L1 401¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L2 402¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L3 403¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L4 404¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L5 405¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_TOTAL 409¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L0 410¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L1 411¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L2 412¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L3 413¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L4 414¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L5 415¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_TOTAL 419¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L0 420¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L1 421¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L2 422¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L3 423¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L4 424¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L5 425¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_TOTAL 429¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L0 430¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L1 431¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L2 432¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L3 433¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L4 434¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L5 435¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_TOTAL 439¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0 440¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L1 441¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L2 442¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L3 443¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L4 444¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L5 445¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL 449¶
-
DCGM_FI_DEV_GPU_NVLINK_ERRORS 450¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L6 451¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L7 452¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L8 453¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L9 454¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L10 455¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L11 456¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L6 457¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L7 458¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L8 459¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L9 460¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L10 461¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L11 462¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L6 463¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L7 464¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L8 465¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L9 466¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L10 467¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L11 468¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L6 469¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L7 470¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L8 471¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L9 472¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L10 473¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L11 474¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L6 475¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L7 476¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L8 477¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L9 478¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L10 479¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L11 480¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L12 406¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L13 407¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L14 408¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L15 481¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L16 482¶
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L17 483¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L12 416¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L13 417¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L14 418¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L15 484¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L16 485¶
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L17 486¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L12 426¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L13 427¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L14 428¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L15 487¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L16 488¶
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L17 489¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L12 436¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L13 437¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L14 438¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L15 491¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L16 492¶
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L17 493¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L12 446¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L13 447¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L14 448¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L15 494¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L16 495¶
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L17 496¶
-
DCGM_FI_DEV_VIRTUAL_MODE 500¶
Virtualization Mode corresponding to the GPU.
One of DCGM_GPU_VIRTUALIZATION_MODE_* constants.
-
DCGM_FI_DEV_SUPPORTED_TYPE_INFO 501¶
Includes Count and Static info of vGPU types supported on a device.
-
DCGM_FI_DEV_CREATABLE_VGPU_TYPE_IDS 502¶
Includes Count and currently Creatable vGPU types on a device.
-
DCGM_FI_DEV_VGPU_INSTANCE_IDS 503¶
Includes Count and currently Active vGPU Instances on a device.
-
DCGM_FI_DEV_VGPU_UTILIZATIONS 504¶
Utilization values for vGPUs running on the device.
-
DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION 505¶
Utilization values for processes running within vGPU VMs using the device.
-
DCGM_FI_DEV_ENC_STATS 506¶
Current encoder statistics for a given device.
-
DCGM_FI_DEV_FBC_STATS 507¶
Statistics of current active frame buffer capture sessions on a given device.
-
DCGM_FI_DEV_FBC_SESSIONS_INFO 508¶
Information about active frame buffer capture sessions on a target device.
-
DCGM_FI_DEV_SUPPORTED_VGPU_TYPE_IDS 509¶
Includes Count and currently Supported vGPU types on a device.
-
DCGM_FI_DEV_VGPU_TYPE_INFO 510¶
Includes Static info of vGPU types supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_NAME 511¶
Includes the name of a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_CLASS 512¶
Includes the class of a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_LICENSE 513¶
Includes the license info for a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_VM_ID 520¶
VM ID of the vGPU instance.
-
DCGM_FI_DEV_VGPU_VM_NAME 521¶
VM name of the vGPU instance.
-
DCGM_FI_DEV_VGPU_TYPE 522¶
vGPU type of the vGPU instance
-
DCGM_FI_DEV_VGPU_UUID 523¶
UUID of the vGPU instance.
-
DCGM_FI_DEV_VGPU_DRIVER_VERSION 524¶
Driver version of the vGPU instance.
-
DCGM_FI_DEV_VGPU_MEMORY_USAGE 525¶
Memory usage of the vGPU instance.
-
DCGM_FI_DEV_VGPU_LICENSE_STATUS 526¶
License status of the vGPU.
-
DCGM_FI_DEV_VGPU_FRAME_RATE_LIMIT 527¶
Frame rate limit of the vGPU instance.
-
DCGM_FI_DEV_VGPU_ENC_STATS 528¶
Current encoder statistics of the vGPU instance.
-
DCGM_FI_DEV_VGPU_ENC_SESSIONS_INFO 529¶
Information about all active encoder sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_FBC_STATS 530¶
Statistics of current active frame buffer capture sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_FBC_SESSIONS_INFO 531¶
Information about active frame buffer capture sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_INSTANCE_LICENSE_STATE 532¶
License state information of the vGPU instance.
-
DCGM_FI_DEV_VGPU_PCI_ID 533¶
PCI Id of the vGPU instance.
-
DCGM_FI_DEV_VGPU_VM_GPU_INSTANCE_ID 534¶
GPU Instance ID for the given vGPU Instance.
-
DCGM_FI_FIRST_VGPU_FIELD_ID 520¶
Starting field ID of the vGPU instance.
-
DCGM_FI_LAST_VGPU_FIELD_ID 570¶
Last field ID of the vGPU instance.
-
DCGM_FI_MAX_VGPU_FIELDS DCGM_FI_LAST_VGPU_FIELD_ID - DCGM_FI_FIRST_VGPU_FIELD_ID¶
For now max vGPU field Ids taken as difference of DCGM_FI_LAST_VGPU_FIELD_ID and DCGM_FI_LAST_VGPU_FIELD_ID i.e.
50
-
DCGM_FI_INTERNAL_FIELDS_0_START 600¶
Starting ID for all the internal fields.
-
DCGM_FI_INTERNAL_FIELDS_0_END 699¶
Last ID for all the internal fields.
NVSwitch entity field IDs start here.
NVSwitch latency bins for port 0
-
DCGM_FI_FIRST_NVSWITCH_FIELD_ID 700¶
Starting field ID of the NVSwitch instance.
-
DCGM_FI_DEV_NVSWITCH_LINK_THROUGHPUT_TX 780¶
NVSwitch Tx Throughput Counter for ports 0-17
-
DCGM_FI_DEV_NVSWITCH_LINK_THROUGHPUT_RX 781¶
NVSwitch Rx Throughput Counter for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_FATAL_ERRORS 782¶
NvSwitch fatal_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_NON_FATAL_ERRORS 783¶
NvSwitch non_fatal_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_REPLAY_ERRORS 784¶
NvSwitch replay_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_RECOVERY_ERRORS 785¶
NvSwitch recovery_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_FLIT_ERRORS 786¶
NvSwitch filt_err_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS 787¶
NvLink lane_crs_err_count_aggregate_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS 788¶
NvLink lane ecc_err_count_aggregate_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC0 789¶
Nvlink lane latency low lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC1 790¶
Nvlink lane latency low lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC2 791¶
Nvlink lane latency low lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC3 792¶
Nvlink lane latency low lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC0 793¶
Nvlink lane latency medium lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC1 794¶
Nvlink lane latency medium lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC2 795¶
Nvlink lane latency medium lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC3 796¶
Nvlink lane latency medium lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC0 797¶
Nvlink lane latency high lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC1 798¶
Nvlink lane latency high lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC2 799¶
Nvlink lane latency high lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC3 800¶
Nvlink lane latency high lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC0 801¶
Nvlink lane latency panic lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC1 802¶
Nvlink lane latency panic lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC2 803¶
Nvlink lane latency panic lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC3 804¶
Nvlink lane latency panic lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC0 805¶
Nvlink lane latency count lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC1 806¶
Nvlink lane latency count lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC2 807¶
Nvlink lane latency count lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC3 808¶
Nvlink lane latency count lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE0 809¶
NvLink lane crc_err_count for lane 0 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE1 810¶
NvLink lane crc_err_count for lane 1 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE2 811¶
NvLink lane crc_err_count for lane 2 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE3 812¶
NvLink lane crc_err_count for lane 3 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE0 813¶
NvLink lane ecc_err_count for lane 0 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE1 814¶
NvLink lane ecc_err_count for lane 1 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE2 815¶
NvLink lane ecc_err_count for lane 2 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE3 816¶
NvLink lane ecc_err_count for lane 3 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_FATAL_ERRORS 856¶
NVSwitch fatal error information.
Note: value field indicates the specific SXid reported
-
DCGM_FI_DEV_NVSWITCH_NON_FATAL_ERRORS 857¶
NVSwitch non fatal error information.
Note: value field indicates the specific SXid reported
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_CURRENT 858¶
NVSwitch current temperature.
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SLOWDOWN 859¶
NVSwitch limit slowdown temperature.
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SHUTDOWN 860¶
NVSwitch limit shutdown temperature.
-
DCGM_FI_DEV_NVSWITCH_THROUGHPUT_TX 861¶
NVSwitch throughput Tx.
-
DCGM_FI_DEV_NVSWITCH_THROUGHPUT_RX 862¶
NVSwitch throughput Rx.
-
DCGM_FI_DEV_NVSWITCH_PHYS_ID 863¶
-
DCGM_FI_DEV_NVSWITCH_RESET_REQUIRED 864¶
NVSwitch reset required.
-
DCGM_FI_DEV_NVSWITCH_LINK_ID 865¶
NvSwitch NvLink ID.
-
DCGM_FI_DEV_NVSWITCH_PCIE_DOMAIN 866¶
NvSwitch PCIE domain.
-
DCGM_FI_DEV_NVSWITCH_PCIE_BUS 867¶
NvSwitch PCIE bus.
-
DCGM_FI_DEV_NVSWITCH_PCIE_DEVICE 868¶
NvSwitch PCIE device.
-
DCGM_FI_DEV_NVSWITCH_PCIE_FUNCTION 869¶
NvSwitch PCIE function.
-
DCGM_FI_DEV_NVSWITCH_LINK_STATUS 870¶
NvLink status.
UNKNOWN:-1 OFF:0 SAFE:1 ACTIVE:2 ERROR:3
-
DCGM_FI_DEV_NVSWITCH_LINK_TYPE 871¶
NvLink device type (GPU/Switch).
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_DOMAIN 872¶
NvLink device pcie domain.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_BUS 873¶
NvLink device pcie bus.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_DEVICE 874¶
NvLink device pcie device.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_FUNCTION 875¶
NvLink device pcie function.
-
DCGM_FI_DEV_NVSWITCH_LINK_DEVICE_LINK_ID 876¶
NvLink device link ID.
-
DCGM_FI_DEV_NVSWITCH_LINK_DEVICE_LINK_SID 877¶
NvLink device SID.
-
DCGM_FI_DEV_NVSWITCH_LINK_DEVICE_UUID 878¶
NvLink device link uid.
-
DCGM_FI_LAST_NVSWITCH_FIELD_ID 899¶
Last field ID of the NVSwitch instance.
-
DCGM_FI_MAX_NVSWITCH_FIELDS DCGM_FI_LAST_NVSWITCH_FIELD_ID - DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1¶
For now max NVSwitch field Ids taken as difference of DCGM_FI_LAST_NVSWITCH_FIELD_ID and DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1 i.e.
200
-
DCGM_FI_PROF_GR_ENGINE_ACTIVE 1001¶
Profiling Fields.
These all start with DCGM_FI_PROF_* Ratio of time the graphics engine is active. The graphics engine is active if a graphics/compute context is bound and the graphics pipe or compute pipe is busy.
-
DCGM_FI_PROF_SM_ACTIVE 1002¶
The ratio of cycles an SM has at least 1 warp assigned (computed from the number of cycles and elapsed cycles)
-
DCGM_FI_PROF_SM_OCCUPANCY 1003¶
The ratio of number of warps resident on an SM.
(number of resident as a ratio of the theoretical maximum number of warps per elapsed cycle)
-
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE 1004¶
The ratio of cycles the any tensor pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_DRAM_ACTIVE 1005¶
The ratio of cycles the device memory interface is active sending or receiving data.
-
DCGM_FI_PROF_PIPE_FP64_ACTIVE 1006¶
Ratio of cycles the fp64 pipe is active.
-
DCGM_FI_PROF_PIPE_FP32_ACTIVE 1007¶
Ratio of cycles the fp32 pipe is active.
-
DCGM_FI_PROF_PIPE_FP16_ACTIVE 1008¶
Ratio of cycles the fp16 pipe is active.
This does not include HMMA.
-
DCGM_FI_PROF_PCIE_TX_BYTES 1009¶
The number of bytes of active PCIe tx (transmit) data including both header and payload.
Note that this is from the perspective of the GPU, so copying data from device to host (DtoH) would be reflected in this metric.
-
DCGM_FI_PROF_PCIE_RX_BYTES 1010¶
The number of bytes of active PCIe rx (read) data including both header and payload.
Note that this is from the perspective of the GPU, so copying data from host to device (HtoD) would be reflected in this metric.
-
DCGM_FI_PROF_NVLINK_TX_BYTES 1011¶
The total number of bytes of active NvLink tx (transmit) data including both header and payload.
Per-link fields are available below
-
DCGM_FI_PROF_NVLINK_RX_BYTES 1012¶
The total number of bytes of active NvLink rx (read) data including both header and payload.
Per-link fields are available below
-
DCGM_FI_PROF_PIPE_TENSOR_IMMA_ACTIVE 1013¶
The ratio of cycles the tensor (IMMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_TENSOR_HMMA_ACTIVE 1014¶
The ratio of cycles the tensor (HMMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_TENSOR_DFMA_ACTIVE 1015¶
The ratio of cycles the tensor (DFMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_INT_ACTIVE 1016¶
Ratio of cycles the integer pipe is active.
-
DCGM_FI_PROF_NVDEC0_ACTIVE 1017¶
Ratio of cycles each of the NVDEC engines are active.
-
DCGM_FI_PROF_NVDEC1_ACTIVE 1018¶
-
DCGM_FI_PROF_NVDEC2_ACTIVE 1019¶
-
DCGM_FI_PROF_NVDEC3_ACTIVE 1020¶
-
DCGM_FI_PROF_NVDEC4_ACTIVE 1021¶
-
DCGM_FI_PROF_NVDEC5_ACTIVE 1022¶
-
DCGM_FI_PROF_NVDEC6_ACTIVE 1023¶
-
DCGM_FI_PROF_NVDEC7_ACTIVE 1024¶
-
DCGM_FI_PROF_NVJPG0_ACTIVE 1025¶
Ratio of cycles each of the NVJPG engines are active.
-
DCGM_FI_PROF_NVJPG1_ACTIVE 1026¶
-
DCGM_FI_PROF_NVJPG2_ACTIVE 1027¶
-
DCGM_FI_PROF_NVJPG3_ACTIVE 1028¶
-
DCGM_FI_PROF_NVJPG4_ACTIVE 1029¶
-
DCGM_FI_PROF_NVJPG5_ACTIVE 1030¶
-
DCGM_FI_PROF_NVJPG6_ACTIVE 1031¶
-
DCGM_FI_PROF_NVJPG7_ACTIVE 1032¶
-
DCGM_FI_PROF_NVOFA0_ACTIVE 1033¶
Ratio of cycles each of the NVOFA engines are active.
-
DCGM_FI_PROF_NVLINK_L0_TX_BYTES 1040¶
The per-link number of bytes of active NvLink TX (transmit) or RX (transmit) data including both header and payload.
For example: DCGM_FI_PROF_NVLINK_L0_TX_BYTES -> L0 TX To get the bandwidth for a link, add the RX and TX value together like total = DCGM_FI_PROF_NVLINK_L0_TX_BYTES + DCGM_FI_PROF_NVLINK_L0_RX_BYTES
-
DCGM_FI_PROF_NVLINK_L0_RX_BYTES 1041¶
-
DCGM_FI_PROF_NVLINK_L1_TX_BYTES 1042¶
-
DCGM_FI_PROF_NVLINK_L1_RX_BYTES 1043¶
-
DCGM_FI_PROF_NVLINK_L2_TX_BYTES 1044¶
-
DCGM_FI_PROF_NVLINK_L2_RX_BYTES 1045¶
-
DCGM_FI_PROF_NVLINK_L3_TX_BYTES 1046¶
-
DCGM_FI_PROF_NVLINK_L3_RX_BYTES 1047¶
-
DCGM_FI_PROF_NVLINK_L4_TX_BYTES 1048¶
-
DCGM_FI_PROF_NVLINK_L4_RX_BYTES 1049¶
-
DCGM_FI_PROF_NVLINK_L5_TX_BYTES 1050¶
-
DCGM_FI_PROF_NVLINK_L5_RX_BYTES 1051¶
-
DCGM_FI_PROF_NVLINK_L6_TX_BYTES 1052¶
-
DCGM_FI_PROF_NVLINK_L6_RX_BYTES 1053¶
-
DCGM_FI_PROF_NVLINK_L7_TX_BYTES 1054¶
-
DCGM_FI_PROF_NVLINK_L7_RX_BYTES 1055¶
-
DCGM_FI_PROF_NVLINK_L8_TX_BYTES 1056¶
-
DCGM_FI_PROF_NVLINK_L8_RX_BYTES 1057¶
-
DCGM_FI_PROF_NVLINK_L9_TX_BYTES 1058¶
-
DCGM_FI_PROF_NVLINK_L9_RX_BYTES 1059¶
-
DCGM_FI_PROF_NVLINK_L10_TX_BYTES 1060¶
-
DCGM_FI_PROF_NVLINK_L10_RX_BYTES 1061¶
-
DCGM_FI_PROF_NVLINK_L11_TX_BYTES 1062¶
-
DCGM_FI_PROF_NVLINK_L11_RX_BYTES 1063¶
-
DCGM_FI_PROF_NVLINK_L12_TX_BYTES 1064¶
-
DCGM_FI_PROF_NVLINK_L12_RX_BYTES 1065¶
-
DCGM_FI_PROF_NVLINK_L13_TX_BYTES 1066¶
-
DCGM_FI_PROF_NVLINK_L13_RX_BYTES 1067¶
-
DCGM_FI_PROF_NVLINK_L14_TX_BYTES 1068¶
-
DCGM_FI_PROF_NVLINK_L14_RX_BYTES 1069¶
-
DCGM_FI_PROF_NVLINK_L15_TX_BYTES 1070¶
-
DCGM_FI_PROF_NVLINK_L15_RX_BYTES 1071¶
-
DCGM_FI_PROF_NVLINK_L16_TX_BYTES 1072¶
-
DCGM_FI_PROF_NVLINK_L16_RX_BYTES 1073¶
-
DCGM_FI_PROF_NVLINK_L17_TX_BYTES 1074¶
-
DCGM_FI_PROF_NVLINK_L17_RX_BYTES 1075¶
-
DCGM_FI_PROF_NVLINK_THROUGHPUT_FIRST DCGM_FI_PROF_NVLINK_L0_TX_BYTES¶
NVLink throughput First.
-
DCGM_FI_PROF_NVLINK_THROUGHPUT_LAST DCGM_FI_PROF_NVLINK_L17_RX_BYTES¶
NVLink throughput Last.
-
DCGM_FI_MAX_FIELDS 1076¶
1 greater than maximum fields above.
This is the 1 greater than the maximum field id that could be allocated
Functions
-
dcgm_field_meta_p DcgmFieldGetById(unsigned short fieldId)¶
Get a pointer to the metadata for a field by its field ID.
See DCGM_FI_? for a list of field IDs.
- Parameters
fieldId – IN: One of the field IDs (DCGM_FI_?)
- Returns
0 On Failure >0 Pointer to field metadata structure if found.
-
dcgm_field_meta_p DcgmFieldGetByTag(const char *tag)¶
Get a pointer to the metadata for a field by its field tag.
- Parameters
tag – IN: Tag for the field of interest
- Returns
0 On failure or not found >0 Pointer to field metadata structure if found
-
int DcgmFieldsInit(void)¶
Initialize the DcgmFields module.
Call this once from inside your program
- Returns
0 On success <0 On error
-
int DcgmFieldsTerm(void)¶
Terminates the DcgmFields module.
Call this once from inside your program
- Returns
0 On success <0 On error
-
const char *DcgmFieldsGetEntityGroupString(dcgm_field_entity_group_t entityGroupId)¶
Get the string version of a entityGroupId.
- Returns
Pointer to a string like GPU/NvSwitch..etc
Null on error
-
DCGM_FI_UNKNOWN 0¶