Field Identifiers
- group dcgmFieldIdentifiers
Field Identifiers.
Defines
-
DCGM_FI_UNKNOWN 0
NULL field.
-
DCGM_FI_DRIVER_VERSION 1
Driver Version.
-
DCGM_FI_NVML_VERSION 2
-
DCGM_FI_PROCESS_NAME 3
-
DCGM_FI_DEV_COUNT 4
Number of Devices on the node.
-
DCGM_FI_CUDA_DRIVER_VERSION 5
Cuda Driver Version Retrieves a number with the major value in the thousands place and the minor value in the hundreds place.
CUDA 11.1 = 11100
-
DCGM_FI_DEV_NAME 50
Name of the GPU device.
-
DCGM_FI_DEV_BRAND 51
Device Brand.
-
DCGM_FI_DEV_NVML_INDEX 52
NVML index of this GPU.
-
DCGM_FI_DEV_SERIAL 53
Device Serial Number.
-
DCGM_FI_DEV_UUID 54
UUID corresponding to the device.
-
DCGM_FI_DEV_MINOR_NUMBER 55
Device node minor number /dev/nvidia#.
-
DCGM_FI_DEV_OEM_INFOROM_VER 56
OEM inforom version.
-
DCGM_FI_DEV_PCI_BUSID 57
PCI attributes for the device.
-
DCGM_FI_DEV_PCI_COMBINED_ID 58
The combined 16-bit device id and 16-bit vendor id.
-
DCGM_FI_DEV_PCI_SUBSYS_ID 59
The 32-bit Sub System Device ID.
-
DCGM_FI_GPU_TOPOLOGY_PCI 60
Topology of all GPUs on the system via PCI (static)
-
DCGM_FI_GPU_TOPOLOGY_NVLINK 61
Topology of all GPUs on the system via NVLINK (static)
-
DCGM_FI_GPU_TOPOLOGY_AFFINITY 62
Affinity of all GPUs on the system (static)
-
DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY 63
Cuda compute capability for the device.
The major version is the upper 32 bits and the minor version is the lower 32 bits.
-
DCGM_FI_DEV_COMPUTE_MODE 65
Compute mode for the device.
-
DCGM_FI_DEV_PERSISTENCE_MODE 66
Persistence mode for the device Boolean: 0 is disabled, 1 is enabled.
-
DCGM_FI_DEV_MIG_MODE 67
MIG mode for the device Boolean: 0 is disabled, 1 is enabled.
-
DCGM_FI_DEV_CUDA_VISIBLE_DEVICES_STR 68
The string that CUDA_VISIBLE_DEVICES should be set to for this entity (including MIG)
-
DCGM_FI_DEV_MIG_MAX_SLICES 69
The maximum number of MIG slices supported by this GPU.
-
DCGM_FI_DEV_CPU_AFFINITY_0 70
Device CPU affinity.
part 1/8 = cpus 0 - 63
-
DCGM_FI_DEV_CPU_AFFINITY_1 71
Device CPU affinity.
part 1/8 = cpus 64 - 127
-
DCGM_FI_DEV_CPU_AFFINITY_2 72
Device CPU affinity.
part 2/8 = cpus 128 - 191
-
DCGM_FI_DEV_CPU_AFFINITY_3 73
Device CPU affinity.
part 3/8 = cpus 192 - 255
-
DCGM_FI_DEV_CC_MODE 74
ConfidentialCompute/AmpereProtectedMemory status for this system 0 = disabled 1 = enabled.
-
DCGM_FI_DEV_MIG_ATTRIBUTES 75
Attributes for the given MIG device handles.
-
DCGM_FI_DEV_MIG_GI_INFO 76
GPU instance profile information.
-
DCGM_FI_DEV_MIG_CI_INFO 77
Compute instance profile information.
-
DCGM_FI_DEV_ECC_INFOROM_VER 80
ECC inforom version.
-
DCGM_FI_DEV_POWER_INFOROM_VER 81
Power management object inforom version.
-
DCGM_FI_DEV_INFOROM_IMAGE_VER 82
Inforom image version.
-
DCGM_FI_DEV_INFOROM_CONFIG_CHECK 83
Inforom configuration checksum.
-
DCGM_FI_DEV_INFOROM_CONFIG_VALID 84
Reads the infoROM from the flash and verifies the checksums.
-
DCGM_FI_DEV_VBIOS_VERSION 85
VBIOS version of the device.
-
DCGM_FI_DEV_MEM_AFFINITY_0 86
Device Memory node affinity, 0-63.
-
DCGM_FI_DEV_MEM_AFFINITY_1 87
Device Memory node affinity, 64-127.
-
DCGM_FI_DEV_MEM_AFFINITY_2 88
Device Memory node affinity, 128-191.
-
DCGM_FI_DEV_MEM_AFFINITY_3 89
Device Memory node affinity, 192-255.
-
DCGM_FI_DEV_BAR1_TOTAL 90
Total BAR1 of the GPU in MB.
-
DCGM_FI_SYNC_BOOST 91
Deprecated - Sync boost settings on the node.
-
DCGM_FI_DEV_BAR1_USED 92
Used BAR1 of the GPU in MB.
-
DCGM_FI_DEV_BAR1_FREE 93
Free BAR1 of the GPU in MB.
-
DCGM_FI_DEV_GPM_SUPPORT 94
GPM support for the device
-
DCGM_FI_DEV_SM_CLOCK 100
SM clock for the device.
-
DCGM_FI_DEV_MEM_CLOCK 101
Memory clock for the device.
-
DCGM_FI_DEV_VIDEO_CLOCK 102
Video encoder/decoder clock for the device.
-
DCGM_FI_DEV_APP_SM_CLOCK 110
SM Application clocks.
-
DCGM_FI_DEV_APP_MEM_CLOCK 111
Memory Application clocks.
-
DCGM_FI_DEV_CLOCKS_EVENT_REASONS 112
Current clock event reasons (bitmask of DCGM_CLOCKS_EVENT_REASON_*)
-
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS DCGM_FI_DEV_CLOCKS_EVENT_REASONS
Deprecated: Use DCGM_FI_DEV_CLOCKS_EVENT_REASONS instead.
-
DCGM_FI_DEV_MAX_SM_CLOCK 113
Maximum supported SM clock for the device.
-
DCGM_FI_DEV_MAX_MEM_CLOCK 114
Maximum supported Memory clock for the device.
-
DCGM_FI_DEV_MAX_VIDEO_CLOCK 115
Maximum supported Video encoder/decoder clock for the device.
-
DCGM_FI_DEV_AUTOBOOST 120
Auto-boost for the device (1 = enabled.
0 = disabled)
-
DCGM_FI_DEV_SUPPORTED_CLOCKS 130
Supported clocks for the device.
-
DCGM_FI_DEV_MEMORY_TEMP 140
Memory temperature for the device.
-
DCGM_FI_DEV_GPU_TEMP 150
Current temperature readings for the device, in degrees C.
-
DCGM_FI_DEV_MEM_MAX_OP_TEMP 151
Maximum operating temperature for the memory of this GPU.
-
DCGM_FI_DEV_GPU_MAX_OP_TEMP 152
Maximum operating temperature for this GPU.
-
DCGM_FI_DEV_GPU_TEMP_LIMIT 153
Thermal margin temperature (distance to nearest slowdown threshold) for this GPU.
-
DCGM_FI_DEV_POWER_USAGE 155
Power usage for the device in Watts.
-
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION 156
Total energy consumption for the GPU in mJ since the driver was last reloaded.
-
DCGM_FI_DEV_POWER_USAGE_INSTANT 157
Current instantaneous power usage of the device in Watts.
-
DCGM_FI_DEV_SLOWDOWN_TEMP 158
Slowdown temperature for the device.
-
DCGM_FI_DEV_SHUTDOWN_TEMP 159
Shutdown temperature for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT 160
Current Power limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN 161
Minimum power management limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX 162
Maximum power management limit for the device.
-
DCGM_FI_DEV_POWER_MGMT_LIMIT_DEF 163
Default power management limit for the device.
-
DCGM_FI_DEV_ENFORCED_POWER_LIMIT 164
Effective power limit that the driver enforces after taking into account all limiters.
-
DCGM_FI_DEV_REQUESTED_POWER_PROFILE_MASK 165
Requested workload power profile mask(Blackwell and newer)
-
DCGM_FI_DEV_ENFORCED_POWER_PROFILE_MASK 166
Enforced workload power profile mask(Blackwell and newer)
-
DCGM_FI_DEV_VALID_POWER_PROFILE_MASK 167
Requested workload power profile mask(Blackwell and newer)
-
DCGM_FI_DEV_FABRIC_MANAGER_STATUS 170
The status of the fabric manager - a value from dcgmFabricManagerStatus_t.
-
DCGM_FI_DEV_FABRIC_MANAGER_ERROR_CODE 171
The failure that happened while starting the Fabric Manager, if any NOTE: this is not populated unless the fabric manager completed startup.
-
DCGM_FI_DEV_FABRIC_CLUSTER_UUID 172
The uuid of the cluster to which this GPU belongs.
-
DCGM_FI_DEV_FABRIC_CLIQUE_ID 173
The ID of the fabric clique to which this GPU belongs.
-
DCGM_FI_DEV_PSTATE 190
Performance state (P-State) 0-15.
0=highest
-
DCGM_FI_DEV_FAN_SPEED 191
Fan speed for the device in percent 0-100.
-
DCGM_FI_DEV_PCIE_TX_THROUGHPUT 200
PCIe Tx utilization information.
Deprecated: Use DCGM_FI_PROF_PCIE_TX_BYTES instead.
-
DCGM_FI_DEV_PCIE_RX_THROUGHPUT 201
PCIe Rx utilization information.
Deprecated: Use DCGM_FI_PROF_PCIE_RX_BYTES instead.
-
DCGM_FI_DEV_PCIE_REPLAY_COUNTER 202
PCIe replay counter.
-
DCGM_FI_DEV_GPU_UTIL 203
GPU Utilization.
-
DCGM_FI_DEV_MEM_COPY_UTIL 204
Memory Utilization.
-
DCGM_FI_DEV_ACCOUNTING_DATA 205
Process accounting stats.
This field is only supported when the host engine is running as root unless you enable accounting ahead of time. Accounting mode can be enabled by running “nvidia-smi -am 1” as root on the same node the host engine is running on.
-
DCGM_FI_DEV_ENC_UTIL 206
Encoder Utilization.
-
DCGM_FI_DEV_DEC_UTIL 207
Decoder Utilization.
-
DCGM_FI_DEV_XID_ERRORS 230
XID errors.
The value is the specific XID error
-
DCGM_FI_DEV_PCIE_MAX_LINK_GEN 235
PCIe Max Link Generation.
-
DCGM_FI_DEV_PCIE_MAX_LINK_WIDTH 236
PCIe Max Link Width.
-
DCGM_FI_DEV_PCIE_LINK_GEN 237
PCIe Current Link Generation.
-
DCGM_FI_DEV_PCIE_LINK_WIDTH 238
PCIe Current Link Width.
-
DCGM_FI_DEV_POWER_VIOLATION 240
Power Violation time in ns.
-
DCGM_FI_DEV_THERMAL_VIOLATION 241
Thermal Violation time in ns.
-
DCGM_FI_DEV_SYNC_BOOST_VIOLATION 242
Sync Boost Violation time in ns.
-
DCGM_FI_DEV_BOARD_LIMIT_VIOLATION 243
Board violation limit.
-
DCGM_FI_DEV_LOW_UTIL_VIOLATION 244
Low utilisation violation limit.
-
DCGM_FI_DEV_RELIABILITY_VIOLATION 245
Reliability violation limit.
-
DCGM_FI_DEV_TOTAL_APP_CLOCKS_VIOLATION 246
App clock violation limit.
-
DCGM_FI_DEV_TOTAL_BASE_CLOCKS_VIOLATION 247
Base clock violation limit.
-
DCGM_FI_DEV_FB_TOTAL 250
Total Frame Buffer of the GPU in MB.
-
DCGM_FI_DEV_FB_FREE 251
Free Frame Buffer in MB.
-
DCGM_FI_DEV_FB_USED 252
Used Frame Buffer in MB.
-
DCGM_FI_DEV_FB_RESERVED 253
Reserved Frame Buffer in MB.
-
DCGM_FI_DEV_FB_USED_PERCENT 254
Percentage used of Frame Buffer: ‘Used/(Total - Reserved)’.
Range 0.0-1.0
-
DCGM_FI_DEV_C2C_LINK_COUNT 285
C2C Link Count.
-
DCGM_FI_DEV_C2C_LINK_STATUS 286
C2C Link Status The value of 0 the link is INACTIVE.
The value of 1 the link is ACTIVE.
-
DCGM_FI_DEV_C2C_MAX_BANDWIDTH 287
C2C Max Bandwidth The value indicates the link speed in MB/s.
-
DCGM_FI_DEV_ECC_CURRENT 300
Current ECC mode for the device.
-
DCGM_FI_DEV_ECC_PENDING 301
Pending ECC mode for the device.
-
DCGM_FI_DEV_ECC_SBE_VOL_TOTAL 310
Total single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_TOTAL 311
Total double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_TOTAL 312
Total single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_TOTAL 313
Total double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_VOL_L1 314
L1 cache single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_L1 315
L1 cache double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_L2 316
L2 cache single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_L2 317
L2 cache double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_DEV 318
Device memory single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_DEV 319
Device memory double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_REG 320
Register file single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_REG 321
Register file double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_TEX 322
Texture memory single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_TEX 323
Texture memory double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_L1 324
L1 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_L1 325
L1 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_L2 326
L2 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_L2 327
L2 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_DEV 328
Device memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_DEV 329
Device memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_REG 330
Register File single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_REG 331
Register File double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_AGG_TEX 332
Texture memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_DBE_AGG_TEX 333
Texture memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.
-
DCGM_FI_DEV_ECC_SBE_VOL_SHM 334
Texture SHM single bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_SHM 335
Texture SHM double bit volatile ECC errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_CBU 336
CBU single bit ECC volatile errors.
-
DCGM_FI_DEV_ECC_DBE_VOL_CBU 337
CBU double bit ECC volatile errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_SHM 338
Texture SHM single bit aggregate ECC errors.
-
DCGM_FI_DEV_ECC_DBE_AGG_SHM 339
Texture SHM double bit aggregate ECC errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_CBU 340
CBU single bit ECC aggregate errors.
-
DCGM_FI_DEV_ECC_DBE_AGG_CBU 341
CBU double bit ECC aggregate errors.
-
DCGM_FI_DEV_ECC_SBE_VOL_SRM 342
Turing and later fields.
SRAM single bit ECC volatile errors
-
DCGM_FI_DEV_ECC_DBE_VOL_SRM 343
SRAM double bit ECC volatile errors.
-
DCGM_FI_DEV_ECC_SBE_AGG_SRM 344
SRAM single bit ECC aggregate errors.
-
DCGM_FI_DEV_ECC_DBE_AGG_SRM 345
SRAM double bit ECC aggregate errors.
-
DCGM_FI_DEV_DIAG_MEMORY_RESULT 350
Result of the GPU Memory test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_DIAGNOSTIC_RESULT 351
Result of the Diagnostics test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_PCIE_RESULT 352
Result of the PCIe + NVLink test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_TARGETED_STRESS_RESULT 353
Result of the Targeted Stress test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_TARGETED_POWER_RESULT 354
Result of the Targeted Power test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_MEMORY_BANDWIDTH_RESULT 355
Result of the Memory Bandwidth test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_MEMTEST_RESULT 356
Result of the Memory Stress test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_PULSE_TEST_RESULT 357
Result of the Input Energy Delayed Product power (EDPp) test (a.k.a.
the pulse test) Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration
-
DCGM_FI_DEV_DIAG_EUD_RESULT 358
Result of the Extended Utility Diagnostics (EUD) test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_CPU_EUD_RESULT 359
Result of the CPU Extended Utility Diagnostics (CPU EUD) test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_SOFTWARE_RESULT 360
Result of the Software test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_NVBANDWIDTH_RESULT 361
Result of the NVBandwidth test Refers to a
int64_t
storing a value drawn fromdcgmError_t
enumeration.
-
DCGM_FI_DEV_DIAG_STATUS 362
-
DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_MAX 385
Historical max available spare memory rows per memory bank.
-
DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_HIGH 386
Historical high mark of available spare memory rows per memory bank.
-
DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_PARTIAL 387
Historical mark of partial available spare memory rows per memory bank.
-
DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_LOW 388
Historical low mark of available spare memory rows per memory bank.
-
DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_NONE 389
Historical marker of memory banks with no available spare memory rows.
-
DCGM_FI_DEV_RETIRED_SBE 390
Number of retired pages because of single bit errors Note: monotonically increasing.
-
DCGM_FI_DEV_RETIRED_DBE 391
Number of retired pages because of double bit errors Note: monotonically increasing.
-
DCGM_FI_DEV_RETIRED_PENDING 392
Number of pages pending retirement.
-
DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS 393
Number of remapped rows for uncorrectable errors.
-
DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS 394
Number of remapped rows for correctable errors.
-
DCGM_FI_DEV_ROW_REMAP_FAILURE 395
Whether remapping of rows has failed.
-
DCGM_FI_DEV_ROW_REMAP_PENDING 396
Whether remapping of rows is pending.
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L0 400
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L1 401
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L2 402
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L3 403
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L4 404
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L5 405
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_TOTAL 409
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L0 410
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L1 411
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L2 412
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L3 413
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L4 414
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L5 415
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_TOTAL 419
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L0 420
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L1 421
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L2 422
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L3 423
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L4 424
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L5 425
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_TOTAL 429
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L0 430
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L1 431
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L2 432
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L3 433
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L4 434
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L5 435
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_TOTAL 439
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0 440
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L1 441
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L2 442
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L3 443
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L4 444
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L5 445
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL 449
-
DCGM_FI_DEV_GPU_NVLINK_ERRORS 450
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L6 451
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L7 452
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L8 453
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L9 454
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L10 455
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L11 456
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L6 457
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L7 458
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L8 459
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L9 460
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L10 461
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L11 462
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L6 463
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L7 464
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L8 465
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L9 466
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L10 467
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L11 468
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L6 469
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L7 470
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L8 471
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L9 472
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L10 473
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L11 474
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L6 475
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L7 476
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L8 477
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L9 478
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L10 479
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L11 480
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L12 406
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L13 407
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L14 408
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L15 481
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L16 482
-
DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L17 483
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L12 416
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L13 417
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L14 418
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L15 484
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L16 485
-
DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L17 486
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L12 426
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L13 427
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L14 428
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L15 487
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L16 488
-
DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L17 489
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L12 436
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L13 437
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L14 438
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L15 491
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L16 492
-
DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L17 493
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L12 446
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L13 447
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L14 448
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L15 494
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L16 495
-
DCGM_FI_DEV_NVLINK_BANDWIDTH_L17 496
-
DCGM_FI_DEV_NVLINK_ERROR_DL_CRC 497
-
DCGM_FI_DEV_NVLINK_ERROR_DL_RECOVERY 498
-
DCGM_FI_DEV_NVLINK_ERROR_DL_REPLAY 499
-
DCGM_FI_DEV_VIRTUAL_MODE 500
Virtualization Mode corresponding to the GPU.
One of DCGM_GPU_VIRTUALIZATION_MODE_* constants.
-
DCGM_FI_DEV_SUPPORTED_TYPE_INFO 501
Includes Count and Static info of vGPU types supported on a device.
-
DCGM_FI_DEV_CREATABLE_VGPU_TYPE_IDS 502
Includes Count and currently Creatable vGPU types on a device.
-
DCGM_FI_DEV_VGPU_INSTANCE_IDS 503
Includes Count and currently Active vGPU Instances on a device.
-
DCGM_FI_DEV_VGPU_UTILIZATIONS 504
Utilization values for vGPUs running on the device.
-
DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION 505
Utilization values for processes running within vGPU VMs using the device.
-
DCGM_FI_DEV_ENC_STATS 506
Current encoder statistics for a given device.
-
DCGM_FI_DEV_FBC_STATS 507
Statistics of current active frame buffer capture sessions on a given device.
-
DCGM_FI_DEV_FBC_SESSIONS_INFO 508
Information about active frame buffer capture sessions on a target device.
-
DCGM_FI_DEV_SUPPORTED_VGPU_TYPE_IDS 509
Includes Count and currently Supported vGPU types on a device.
-
DCGM_FI_DEV_VGPU_TYPE_INFO 510
Includes Static info of vGPU types supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_NAME 511
Includes the name of a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_CLASS 512
Includes the class of a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_TYPE_LICENSE 513
Includes the license info for a vGPU type supported on a device.
-
DCGM_FI_DEV_VGPU_VM_ID 520
VM ID of the vGPU instance.
-
DCGM_FI_DEV_VGPU_VM_NAME 521
VM name of the vGPU instance.
-
DCGM_FI_DEV_VGPU_TYPE 522
vGPU type of the vGPU instance
-
DCGM_FI_DEV_VGPU_UUID 523
UUID of the vGPU instance.
-
DCGM_FI_DEV_VGPU_DRIVER_VERSION 524
Driver version of the vGPU instance.
-
DCGM_FI_DEV_VGPU_MEMORY_USAGE 525
Memory usage of the vGPU instance.
-
DCGM_FI_DEV_VGPU_LICENSE_STATUS 526
License status of the vGPU.
0 = vgpu is not licensed
1 = vgpu is licensed
-
DCGM_FI_DEV_VGPU_FRAME_RATE_LIMIT 527
Frame rate limit of the vGPU instance.
-
DCGM_FI_DEV_VGPU_ENC_STATS 528
Current encoder statistics of the vGPU instance.
-
DCGM_FI_DEV_VGPU_ENC_SESSIONS_INFO 529
Information about all active encoder sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_FBC_STATS 530
Statistics of current active frame buffer capture sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_FBC_SESSIONS_INFO 531
Information about active frame buffer capture sessions on the vGPU instance.
-
DCGM_FI_DEV_VGPU_INSTANCE_LICENSE_STATE 532
License state information of the vGPU instance.
-
DCGM_FI_DEV_VGPU_PCI_ID 533
PCI Id of the vGPU instance.
-
DCGM_FI_DEV_VGPU_VM_GPU_INSTANCE_ID 534
GPU Instance ID for the given vGPU Instance.
-
DCGM_FI_FIRST_VGPU_FIELD_ID 520
Starting field ID of the vGPU instance.
-
DCGM_FI_LAST_VGPU_FIELD_ID 570
Last field ID of the vGPU instance.
-
DCGM_FI_MAX_VGPU_FIELDS DCGM_FI_LAST_VGPU_FIELD_ID - DCGM_FI_FIRST_VGPU_FIELD_ID
For now max vGPU field Ids taken as difference of DCGM_FI_LAST_VGPU_FIELD_ID and DCGM_FI_LAST_VGPU_FIELD_ID i.e.
50
-
DCGM_FI_DEV_PLATFORM_INFINIBAND_GUID 571
Infiniband GUID string (e.g.
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
-
DCGM_FI_DEV_PLATFORM_CHASSIS_SERIAL_NUMBER 572
Serial number of the chassis containing this GPU.
-
DCGM_FI_DEV_PLATFORM_CHASSIS_SLOT_NUMBER 573
Slot number in the rack containing the GPU (includes switches)
-
DCGM_FI_DEV_PLATFORM_TRAY_INDEX 574
Tray index within the compute slots in the chassis containing this GPU (does not include switches)
-
DCGM_FI_DEV_PLATFORM_HOST_ID 575
Index of the node within the slot containing the GPU.
-
DCGM_FI_DEV_PLATFORM_PEER_TYPE 576
Platform indicated NVLink-peer type (e.g.
switch present or not)
-
DCGM_FI_DEV_PLATFORM_MODULE_ID 577
ID of the GPU within the node.
-
DCGM_FI_INTERNAL_FIELDS_0_START 600
Starting ID for all the internal fields.
-
DCGM_FI_INTERNAL_FIELDS_0_END 699
Last ID for all the internal fields.
NVSwitch entity field IDs start here.
NVSwitch latency bins for port 0
-
DCGM_FI_FIRST_NVSWITCH_FIELD_ID 700
Starting field ID of the NVSwitch instance.
-
DCGM_FI_DEV_NVSWITCH_VOLTAGE_MVOLT 701
NvSwitch voltage.
-
DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ 702
NvSwitch Current IDDQ.
-
DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ_REV 703
NvSwitch Current IDDQ Rev.
-
DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ_DVDD 704
NvSwitch Current IDDQ Rev DVDD.
-
DCGM_FI_DEV_NVSWITCH_POWER_VDD 705
NvSwitch Power VDD in watts.
-
DCGM_FI_DEV_NVSWITCH_POWER_DVDD 706
NvSwitch Power DVDD in watts.
-
DCGM_FI_DEV_NVSWITCH_POWER_HVDD 707
NvSwitch Power HVDD in watts.
-
DCGM_FI_DEV_NVSWITCH_LINK_THROUGHPUT_TX 780
NVSwitch Tx Throughput Counter for ports 0-17
-
DCGM_FI_DEV_NVSWITCH_LINK_THROUGHPUT_RX 781
NVSwitch Rx Throughput Counter for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_FATAL_ERRORS 782
NvSwitch fatal_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_NON_FATAL_ERRORS 783
NvSwitch non_fatal_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_REPLAY_ERRORS 784
NvSwitch replay_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_RECOVERY_ERRORS 785
NvSwitch recovery_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_FLIT_ERRORS 786
NvSwitch filt_err_count_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS 787
NvLink lane_crs_err_count_aggregate_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS 788
NvLink lane ecc_err_count_aggregate_errors for ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC0 789
Nvlink lane latency low lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC1 790
Nvlink lane latency low lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC2 791
Nvlink lane latency low lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_LOW_VC3 792
Nvlink lane latency low lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC0 793
Nvlink lane latency medium lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC1 794
Nvlink lane latency medium lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC2 795
Nvlink lane latency medium lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_MEDIUM_VC3 796
Nvlink lane latency medium lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC0 797
Nvlink lane latency high lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC1 798
Nvlink lane latency high lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC2 799
Nvlink lane latency high lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_HIGH_VC3 800
Nvlink lane latency high lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC0 801
Nvlink lane latency panic lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC1 802
Nvlink lane latency panic lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC2 803
Nvlink lane latency panic lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_PANIC_VC3 804
Nvlink lane latency panic lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC0 805
Nvlink lane latency count lane0 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC1 806
Nvlink lane latency count lane1 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC2 807
Nvlink lane latency count lane2 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_LATENCY_COUNT_VC3 808
Nvlink lane latency count lane3 counter.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE0 809
NvLink lane crc_err_count for lane 0 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE1 810
NvLink lane crc_err_count for lane 1 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE2 811
NvLink lane crc_err_count for lane 2 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE3 812
NvLink lane crc_err_count for lane 3 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE0 813
NvLink lane ecc_err_count for lane 0 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE1 814
NvLink lane ecc_err_count for lane 1 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE2 815
NvLink lane ecc_err_count for lane 2 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE3 816
NvLink lane ecc_err_count for lane 3 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE4 817
NvLink lane crc_err_count for lane 4 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE5 818
NvLink lane crc_err_count for lane 5 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE6 819
NvLink lane crc_err_count for lane 6 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_CRC_ERRORS_LANE7 820
NvLink lane crc_err_count for lane 7 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE4 821
NvLink lane ecc_err_count for lane 4 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE5 822
NvLink lane ecc_err_count for lane 5 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE6 823
NvLink lane ecc_err_count for lane 6 on ports 0-17.
-
DCGM_FI_DEV_NVSWITCH_LINK_ECC_ERRORS_LANE7 824
NvLink lane ecc_err_count for lane 7 on ports 0-17.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L0 825
NV Link TX Bandwidth Counter for Lane 0.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L1 826
NV Link TX Bandwidth Counter for Lane 1.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L2 827
NV Link TX Bandwidth Counter for Lane 2.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L3 828
NV Link TX Bandwidth Counter for Lane 3.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L4 829
NV Link TX Bandwidth Counter for Lane 4.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L5 830
NV Link TX Bandwidth Counter for Lane 5.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L6 831
NV Link TX Bandwidth Counter for Lane 6.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L7 832
NV Link TX Bandwidth Counter for Lane 7.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L8 833
NV Link TX Bandwidth Counter for Lane 8.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L9 834
NV Link TX Bandwidth Counter for Lane 9.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L10 835
NV Link TX Bandwidth Counter for Lane 10.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L11 836
NV Link TX Bandwidth Counter for Lane 11.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L12 837
NV Link TX Bandwidth Counter for Lane 12.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L13 838
NV Link TX Bandwidth Counter for Lane 13.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L14 839
NV Link TX Bandwidth Counter for Lane 14.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L15 840
NV Link TX Bandwidth Counter for Lane 15.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L16 841
NV Link TX Bandwidth Counter for Lane 16.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_L17 842
NV Link TX Bandwidth Counter for Lane 17.
-
DCGM_FI_DEV_NVLINK_TX_BANDWIDTH_TOTAL 843
NV Link Bandwidth Counter total for all TX Lanes.
-
DCGM_FI_DEV_NVSWITCH_FATAL_ERRORS 856
NVSwitch fatal error information.
Note: value field indicates the specific SXid reported
-
DCGM_FI_DEV_NVSWITCH_NON_FATAL_ERRORS 857
NVSwitch non fatal error information.
Note: value field indicates the specific SXid reported
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_CURRENT 858
NVSwitch current temperature.
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SLOWDOWN 859
NVSwitch limit slowdown temperature.
-
DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SHUTDOWN 860
NVSwitch limit shutdown temperature.
-
DCGM_FI_DEV_NVSWITCH_THROUGHPUT_TX 861
NVSwitch throughput Tx.
-
DCGM_FI_DEV_NVSWITCH_THROUGHPUT_RX 862
NVSwitch throughput Rx.
-
DCGM_FI_DEV_NVSWITCH_PHYS_ID 863
-
DCGM_FI_DEV_NVSWITCH_RESET_REQUIRED 864
NVSwitch reset required.
-
DCGM_FI_DEV_NVSWITCH_LINK_ID 865
NvSwitch NvLink ID.
-
DCGM_FI_DEV_NVSWITCH_PCIE_DOMAIN 866
NvSwitch PCIE domain.
-
DCGM_FI_DEV_NVSWITCH_PCIE_BUS 867
NvSwitch PCIE bus.
-
DCGM_FI_DEV_NVSWITCH_PCIE_DEVICE 868
NvSwitch PCIE device.
-
DCGM_FI_DEV_NVSWITCH_PCIE_FUNCTION 869
NvSwitch PCIE function.
-
DCGM_FI_DEV_NVSWITCH_LINK_STATUS 870
NvLink status.
UNKNOWN:-1 OFF:0 SAFE:1 ACTIVE:2 ERROR:3
-
DCGM_FI_DEV_NVSWITCH_LINK_TYPE 871
NvLink device type (GPU/Switch).
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_DOMAIN 872
NvLink device pcie domain.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_BUS 873
NvLink device pcie bus.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_DEVICE 874
NvLink device pcie device.
-
DCGM_FI_DEV_NVSWITCH_LINK_REMOTE_PCIE_FUNCTION 875
NvLink device pcie function.
-
DCGM_FI_DEV_NVSWITCH_LINK_DEVICE_LINK_ID 876
NvLink device link ID.
-
DCGM_FI_DEV_NVSWITCH_LINK_DEVICE_LINK_SID 877
NvLink device SID.
-
DCGM_FI_DEV_NVSWITCH_DEVICE_UUID 878
NvLink device switch/link uid.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L0 879
NV Link RX Bandwidth Counter for Lane 0.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L1 880
NV Link RX Bandwidth Counter for Lane 1.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L2 881
NV Link RX Bandwidth Counter for Lane 2.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L3 882
NV Link RX Bandwidth Counter for Lane 3.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L4 883
NV Link RX Bandwidth Counter for Lane 4.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L5 884
NV Link RX Bandwidth Counter for Lane 5.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L6 885
NV Link RX Bandwidth Counter for Lane 6.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L7 886
NV Link RX Bandwidth Counter for Lane 7.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L8 887
NV Link RX Bandwidth Counter for Lane 8.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L9 888
NV Link RX Bandwidth Counter for Lane 9.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L10 889
NV Link RX Bandwidth Counter for Lane 10.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L11 890
NV Link RX Bandwidth Counter for Lane 11.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L12 891
NV Link RX Bandwidth Counter for Lane 12.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L13 892
NV Link RX Bandwidth Counter for Lane 13.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L14 893
NV Link RX Bandwidth Counter for Lane 14.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L15 894
NV Link RX Bandwidth Counter for Lane 15.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L16 895
NV Link RX Bandwidth Counter for Lane 16.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_L17 896
NV Link RX Bandwidth Counter for Lane 17.
-
DCGM_FI_DEV_NVLINK_RX_BANDWIDTH_TOTAL 897
NV Link Bandwidth Counter total for all RX Lanes.
-
DCGM_FI_LAST_NVSWITCH_FIELD_ID 899
Last field ID of the NVSwitch instance.
-
DCGM_FI_MAX_NVSWITCH_FIELDS DCGM_FI_LAST_NVSWITCH_FIELD_ID - DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1
For now max NVSwitch field Ids taken as difference of DCGM_FI_LAST_NVSWITCH_FIELD_ID and DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1 i.e.
200
-
DCGM_FI_PROF_GR_ENGINE_ACTIVE 1001
Profiling Fields.
These all start with DCGM_FI_PROF_* Ratio of time the graphics engine is active. The graphics engine is active if a graphics/compute context is bound and the graphics pipe or compute pipe is busy.
-
DCGM_FI_PROF_SM_ACTIVE 1002
The ratio of cycles an SM has at least 1 warp assigned (computed from the number of cycles and elapsed cycles)
-
DCGM_FI_PROF_SM_OCCUPANCY 1003
The ratio of number of warps resident on an SM.
(number of resident as a ratio of the theoretical maximum number of warps per elapsed cycle)
-
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE 1004
The ratio of cycles the any tensor pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_DRAM_ACTIVE 1005
The ratio of cycles the device memory interface is active sending or receiving data.
-
DCGM_FI_PROF_PIPE_FP64_ACTIVE 1006
Ratio of cycles the fp64 pipe is active.
-
DCGM_FI_PROF_PIPE_FP32_ACTIVE 1007
Ratio of cycles the fp32 pipe is active.
-
DCGM_FI_PROF_PIPE_FP16_ACTIVE 1008
Ratio of cycles the fp16 pipe is active.
This does not include HMMA.
-
DCGM_FI_PROF_PCIE_TX_BYTES 1009
The number of bytes of active PCIe tx (transmit) data including both header and payload.
Note that this is from the perspective of the GPU, so copying data from device to host (DtoH) would be reflected in this metric.
-
DCGM_FI_PROF_PCIE_RX_BYTES 1010
The number of bytes of active PCIe rx (read) data including both header and payload.
Note that this is from the perspective of the GPU, so copying data from host to device (HtoD) would be reflected in this metric.
-
DCGM_FI_PROF_NVLINK_TX_BYTES 1011
The total number of bytes of active NvLink tx (transmit) data including both header and payload.
Per-link fields are available below
-
DCGM_FI_PROF_NVLINK_RX_BYTES 1012
The total number of bytes of active NvLink rx (read) data including both header and payload.
Per-link fields are available below
-
DCGM_FI_PROF_PIPE_TENSOR_IMMA_ACTIVE 1013
The ratio of cycles the tensor (IMMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_TENSOR_HMMA_ACTIVE 1014
The ratio of cycles the tensor (HMMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_TENSOR_DFMA_ACTIVE 1015
The ratio of cycles the tensor (DFMA) pipe is active (off the peak sustained elapsed cycles)
-
DCGM_FI_PROF_PIPE_INT_ACTIVE 1016
Ratio of cycles the integer pipe is active.
-
DCGM_FI_PROF_NVDEC0_ACTIVE 1017
Ratio of cycles each of the NVDEC engines are active.
-
DCGM_FI_PROF_NVDEC1_ACTIVE 1018
-
DCGM_FI_PROF_NVDEC2_ACTIVE 1019
-
DCGM_FI_PROF_NVDEC3_ACTIVE 1020
-
DCGM_FI_PROF_NVDEC4_ACTIVE 1021
-
DCGM_FI_PROF_NVDEC5_ACTIVE 1022
-
DCGM_FI_PROF_NVDEC6_ACTIVE 1023
-
DCGM_FI_PROF_NVDEC7_ACTIVE 1024
-
DCGM_FI_PROF_NVJPG0_ACTIVE 1025
Ratio of cycles each of the NVJPG engines are active.
-
DCGM_FI_PROF_NVJPG1_ACTIVE 1026
-
DCGM_FI_PROF_NVJPG2_ACTIVE 1027
-
DCGM_FI_PROF_NVJPG3_ACTIVE 1028
-
DCGM_FI_PROF_NVJPG4_ACTIVE 1029
-
DCGM_FI_PROF_NVJPG5_ACTIVE 1030
-
DCGM_FI_PROF_NVJPG6_ACTIVE 1031
-
DCGM_FI_PROF_NVJPG7_ACTIVE 1032
-
DCGM_FI_PROF_NVOFA0_ACTIVE 1033
Ratio of cycles each of the NVOFA engines are active.
-
DCGM_FI_PROF_NVOFA1_ACTIVE 1034
-
DCGM_FI_PROF_NVLINK_L0_TX_BYTES 1040
The per-link number of bytes of active NvLink TX (transmit) or RX (transmit) data including both header and payload.
For example: DCGM_FI_PROF_NVLINK_L0_TX_BYTES -> L0 TX To get the bandwidth for a link, add the RX and TX value together like total = DCGM_FI_PROF_NVLINK_L0_TX_BYTES + DCGM_FI_PROF_NVLINK_L0_RX_BYTES
-
DCGM_FI_PROF_NVLINK_L0_RX_BYTES 1041
-
DCGM_FI_PROF_NVLINK_L1_TX_BYTES 1042
-
DCGM_FI_PROF_NVLINK_L1_RX_BYTES 1043
-
DCGM_FI_PROF_NVLINK_L2_TX_BYTES 1044
-
DCGM_FI_PROF_NVLINK_L2_RX_BYTES 1045
-
DCGM_FI_PROF_NVLINK_L3_TX_BYTES 1046
-
DCGM_FI_PROF_NVLINK_L3_RX_BYTES 1047
-
DCGM_FI_PROF_NVLINK_L4_TX_BYTES 1048
-
DCGM_FI_PROF_NVLINK_L4_RX_BYTES 1049
-
DCGM_FI_PROF_NVLINK_L5_TX_BYTES 1050
-
DCGM_FI_PROF_NVLINK_L5_RX_BYTES 1051
-
DCGM_FI_PROF_NVLINK_L6_TX_BYTES 1052
-
DCGM_FI_PROF_NVLINK_L6_RX_BYTES 1053
-
DCGM_FI_PROF_NVLINK_L7_TX_BYTES 1054
-
DCGM_FI_PROF_NVLINK_L7_RX_BYTES 1055
-
DCGM_FI_PROF_NVLINK_L8_TX_BYTES 1056
-
DCGM_FI_PROF_NVLINK_L8_RX_BYTES 1057
-
DCGM_FI_PROF_NVLINK_L9_TX_BYTES 1058
-
DCGM_FI_PROF_NVLINK_L9_RX_BYTES 1059
-
DCGM_FI_PROF_NVLINK_L10_TX_BYTES 1060
-
DCGM_FI_PROF_NVLINK_L10_RX_BYTES 1061
-
DCGM_FI_PROF_NVLINK_L11_TX_BYTES 1062
-
DCGM_FI_PROF_NVLINK_L11_RX_BYTES 1063
-
DCGM_FI_PROF_NVLINK_L12_TX_BYTES 1064
-
DCGM_FI_PROF_NVLINK_L12_RX_BYTES 1065
-
DCGM_FI_PROF_NVLINK_L13_TX_BYTES 1066
-
DCGM_FI_PROF_NVLINK_L13_RX_BYTES 1067
-
DCGM_FI_PROF_NVLINK_L14_TX_BYTES 1068
-
DCGM_FI_PROF_NVLINK_L14_RX_BYTES 1069
-
DCGM_FI_PROF_NVLINK_L15_TX_BYTES 1070
-
DCGM_FI_PROF_NVLINK_L15_RX_BYTES 1071
-
DCGM_FI_PROF_NVLINK_L16_TX_BYTES 1072
-
DCGM_FI_PROF_NVLINK_L16_RX_BYTES 1073
-
DCGM_FI_PROF_NVLINK_L17_TX_BYTES 1074
-
DCGM_FI_PROF_NVLINK_L17_RX_BYTES 1075
-
DCGM_FI_PROF_NVLINK_THROUGHPUT_FIRST DCGM_FI_PROF_NVLINK_L0_TX_BYTES
NVLink throughput First.
-
DCGM_FI_PROF_NVLINK_THROUGHPUT_LAST DCGM_FI_PROF_NVLINK_L17_RX_BYTES
NVLink throughput Last.
-
DCGM_FI_PROF_C2C_TX_ALL_BYTES 1076
C2C (Chip-to-Chip) interface metrics.
-
DCGM_FI_PROF_C2C_TX_DATA_BYTES 1077
-
DCGM_FI_PROF_C2C_RX_ALL_BYTES 1078
-
DCGM_FI_PROF_C2C_RX_DATA_BYTES 1079
-
DCGM_FI_DEV_CPU_UTIL_TOTAL 1100
CPU Utilization, total.
-
DCGM_FI_DEV_CPU_UTIL_USER 1101
CPU Utilization, user.
-
DCGM_FI_DEV_CPU_UTIL_NICE 1102
CPU Utilization, nice.
-
DCGM_FI_DEV_CPU_UTIL_SYS 1103
CPU Utilization, system time.
-
DCGM_FI_DEV_CPU_UTIL_IRQ 1104
CPU Utilization, interrupt servicing.
-
DCGM_FI_DEV_CPU_TEMP_CURRENT 1110
CPU temperature.
-
DCGM_FI_DEV_CPU_TEMP_WARNING 1111
CPU Warning Temperature.
-
DCGM_FI_DEV_CPU_TEMP_CRITICAL 1112
CPU Critical Temperature.
-
DCGM_FI_DEV_CPU_CLOCK_CURRENT 1120
CPU instantaneous clock speed.
-
DCGM_FI_DEV_CPU_POWER_UTIL_CURRENT 1130
CPU power utilization.
-
DCGM_FI_DEV_CPU_POWER_LIMIT 1131
CPU power limit.
-
DCGM_FI_DEV_SYSIO_POWER_UTIL_CURRENT 1132
SoC power utilization.
-
DCGM_FI_DEV_MODULE_POWER_UTIL_CURRENT 1133
Module power utilization.
-
DCGM_FI_DEV_CPU_VENDOR 1140
CPU vendor name.
-
DCGM_FI_DEV_CPU_MODEL 1141
CPU model name.
-
DCGM_FI_DEV_NVLINK_COUNT_TX_PACKETS 1200
Total Tx packets on the link in NVLink5.
-
DCGM_FI_DEV_NVLINK_COUNT_TX_BYTES 1201
Total Tx bytes on the link in NVLink5.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_PACKETS 1202
Total Rx packets on the link in NVLink5.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_BYTES 1203
Total Rx bytes on the link in NVLink5.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_MALFORMED_PACKET_ERRORS 1204
Number of packets Rx on a link where packets are malformed.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_BUFFER_OVERRUN_ERRORS 1205
Number of packets that were discarded on Rx due to buffer overrun.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_ERRORS 1206
Total number of packets with errors Rx on a link.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_REMOTE_ERRORS 1207
Total number of packets Rx - stomp/EBP marker.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_GENERAL_ERRORS 1208
Total number of packets Rx with header mismatch.
-
DCGM_FI_DEV_NVLINK_COUNT_LOCAL_LINK_INTEGRITY_ERRORS 1209
Total number of times that the count of local errors exceeded a threshold.
-
DCGM_FI_DEV_NVLINK_COUNT_TX_DISCARDS 1210
Total number of tx error packets that were discarded.
-
DCGM_FI_DEV_NVLINK_COUNT_LINK_RECOVERY_SUCCESSFUL_EVENTS 1211
Number of times link went from Up to recovery, succeeded and link came back up.
-
DCGM_FI_DEV_NVLINK_COUNT_LINK_RECOVERY_FAILED_EVENTS 1212
Number of times link went from Up to recovery, failed and link was declared down.
-
DCGM_FI_DEV_NVLINK_COUNT_LINK_RECOVERY_EVENTS 1213
Number of times link went from Up to recovery, irrespective of the result.
-
DCGM_FI_DEV_NVLINK_COUNT_RX_SYMBOL_ERRORS 1214
Number of errors in rx symbols.
-
DCGM_FI_DEV_NVLINK_COUNT_SYMBOL_BER 1215
BER for symbol errors.
-
DCGM_FI_DEV_FIRST_CONNECTX_FIELD_ID 1300
First field id of ConnectX.
-
DCGM_FI_DEV_CONNECTX_HEALTH 1300
Health state of ConnectX.
-
DCGM_FI_DEV_CONNECTX_ACTIVE_PCIE_LINK_WIDTH 1301
Active PCIe link width.
-
DCGM_FI_DEV_CONNECTX_ACTIVE_PCIE_LINK_SPEED 1302
Active PCIe link speed.
-
DCGM_FI_DEV_CONNECTX_EXPECT_PCIE_LINK_WIDTH 1303
Expect PCIe link width.
-
DCGM_FI_DEV_CONNECTX_EXPECT_PCIE_LINK_SPEED 1304
Expect PCIe link speed.
-
DCGM_FI_DEV_CONNECTX_CORRECTABLE_ERR_STATUS 1305
Correctable error status.
-
DCGM_FI_DEV_CONNECTX_CORRECTABLE_ERR_MASK 1306
Correctable error mask.
-
DCGM_FI_DEV_CONNECTX_UNCORRECTABLE_ERR_STATUS 1307
Uncorrectable error status.
-
DCGM_FI_DEV_CONNECTX_UNCORRECTABLE_ERR_MASK 1308
Uncorrectable error mask.
-
DCGM_FI_DEV_CONNECTX_UNCORRECTABLE_ERR_SEVERITY 1309
Uncorrectable error severity.
-
DCGM_FI_DEV_CONNECTX_DEVICE_TEMPERATURE 1310
Device temperature.
-
DCGM_FI_DEV_LAST_CONNECTX_FIELD_ID 1399
The last field id of ConnectX.
-
DCGM_FI_MAX_FIELDS 1311
1 greater than maximum fields above.
This is the 1 greater than the maximum field id that could be allocated
Functions
-
dcgm_field_meta_p DcgmFieldGetById(unsigned short fieldId)
Get a pointer to the metadata for a field by its field ID.
See DCGM_FI_? for a list of field IDs.
- Parameters:
fieldId – IN: One of the field IDs (DCGM_FI_?)
- Returns:
0 On Failure >0 Pointer to field metadata structure if found.
-
dcgm_field_meta_p DcgmFieldGetByTag(const char *tag)
Get a pointer to the metadata for a field by its field tag.
- Parameters:
tag – IN: Tag for the field of interest
- Returns:
0 On failure or not found >0 Pointer to field metadata structure if found
-
int DcgmFieldsInit(void)
Initialize the DcgmFields module.
Call this once from inside your program
- Returns:
0 On success <0 On error
-
int DcgmFieldsTerm(void)
Terminates the DcgmFields module.
Call this once from inside your program
- Returns:
0 On success <0 On error
-
const char *DcgmFieldsGetEntityGroupString(dcgm_field_entity_group_t entityGroupId)
Get the string version of a entityGroupId.
- Returns:
Pointer to a string like GPU/NvSwitch..etc
Null on error
-
DCGM_FI_UNKNOWN 0