Field Identifiers

group dcgmFieldIdentifiers

Field Identifiers.

Defines

DCGM_FI_UNKNOWN 0

NULL field.

DCGM_FI_DRIVER_VERSION 1

Driver Version.

DCGM_FI_NVML_VERSION 2
DCGM_FI_PROCESS_NAME 3
DCGM_FI_DEV_COUNT 4

Number of Devices on the node.

DCGM_FI_CUDA_DRIVER_VERSION 5

Cuda Driver Version Retrieves a number with the major value in the thousands place and the minor value in the hundreds place.

CUDA 11.1 = 11100

DCGM_FI_DEV_NAME 50

Name of the GPU device.

DCGM_FI_DEV_BRAND 51

Device Brand.

DCGM_FI_DEV_NVML_INDEX 52

NVML index of this GPU.

DCGM_FI_DEV_SERIAL 53

Device Serial Number.

DCGM_FI_DEV_UUID 54

UUID corresponding to the device.

DCGM_FI_DEV_MINOR_NUMBER 55

Device node minor number /dev/nvidia#.

DCGM_FI_DEV_OEM_INFOROM_VER 56

OEM inforom version.

DCGM_FI_DEV_PCI_BUSID 57

PCI attributes for the device.

DCGM_FI_DEV_PCI_COMBINED_ID 58

The combined 16-bit device id and 16-bit vendor id.

DCGM_FI_DEV_PCI_SUBSYS_ID 59

The 32-bit Sub System Device ID.

DCGM_FI_GPU_TOPOLOGY_PCI 60

Topology of all GPUs on the system via PCI (static)

Topology of all GPUs on the system via NVLINK (static)

DCGM_FI_GPU_TOPOLOGY_AFFINITY 62

Affinity of all GPUs on the system (static)

DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY 63

Cuda compute capability for the device.

The major version is the upper 32 bits and the minor version is the lower 32 bits.

DCGM_FI_DEV_COMPUTE_MODE 65

Compute mode for the device.

DCGM_FI_DEV_PERSISTENCE_MODE 66

Persistence mode for the device Boolean: 0 is disabled, 1 is enabled.

DCGM_FI_DEV_MIG_MODE 67

MIG mode for the device Boolean: 0 is disabled, 1 is enabled.

DCGM_FI_DEV_CUDA_VISIBLE_DEVICES_STR 68

The string that CUDA_VISIBLE_DEVICES should be set to for this entity (including MIG)

DCGM_FI_DEV_MIG_MAX_SLICES 69

The maximum number of MIG slices supported by this GPU.

DCGM_FI_DEV_CPU_AFFINITY_0 70

Device CPU affinity.

part 1/8 = cpus 0 - 63

DCGM_FI_DEV_CPU_AFFINITY_1 71

Device CPU affinity.

part 1/8 = cpus 64 - 127

DCGM_FI_DEV_CPU_AFFINITY_2 72

Device CPU affinity.

part 2/8 = cpus 128 - 191

DCGM_FI_DEV_CPU_AFFINITY_3 73

Device CPU affinity.

part 3/8 = cpus 192 - 255

DCGM_FI_DEV_CC_MODE 74

ConfidentialCompute/AmpereProtectedMemory status for this system 0 = disabled 1 = enabled.

DCGM_FI_DEV_MIG_ATTRIBUTES 75

Attributes for the given MIG device handles.

DCGM_FI_DEV_MIG_GI_INFO 76

GPU instance profile information.

DCGM_FI_DEV_MIG_CI_INFO 77

Compute instance profile information.

DCGM_FI_DEV_ECC_INFOROM_VER 80

ECC inforom version.

DCGM_FI_DEV_POWER_INFOROM_VER 81

Power management object inforom version.

DCGM_FI_DEV_INFOROM_IMAGE_VER 82

Inforom image version.

DCGM_FI_DEV_INFOROM_CONFIG_CHECK 83

Inforom configuration checksum.

DCGM_FI_DEV_INFOROM_CONFIG_VALID 84

Reads the infoROM from the flash and verifies the checksums.

DCGM_FI_DEV_VBIOS_VERSION 85

VBIOS version of the device.

DCGM_FI_DEV_MEM_AFFINITY_0 86

Device Memory node affinity, 0-63.

DCGM_FI_DEV_MEM_AFFINITY_1 87

Device Memory node affinity, 64-127.

DCGM_FI_DEV_MEM_AFFINITY_2 88

Device Memory node affinity, 128-191.

DCGM_FI_DEV_MEM_AFFINITY_3 89

Device Memory node affinity, 192-255.

DCGM_FI_DEV_BAR1_TOTAL 90

Total BAR1 of the GPU in MB.

DCGM_FI_SYNC_BOOST 91

Deprecated - Sync boost settings on the node.

DCGM_FI_DEV_BAR1_USED 92

Used BAR1 of the GPU in MB.

DCGM_FI_DEV_BAR1_FREE 93

Free BAR1 of the GPU in MB.

DCGM_FI_DEV_SM_CLOCK 100

SM clock for the device.

DCGM_FI_DEV_MEM_CLOCK 101

Memory clock for the device.

DCGM_FI_DEV_VIDEO_CLOCK 102

Video encoder/decoder clock for the device.

DCGM_FI_DEV_APP_SM_CLOCK 110

SM Application clocks.

DCGM_FI_DEV_APP_MEM_CLOCK 111

Memory Application clocks.

DCGM_FI_DEV_CLOCK_THROTTLE_REASONS 112

Current clock throttle reasons (bitmask of DCGM_CLOCKS_THROTTLE_REASON_*)

DCGM_FI_DEV_MAX_SM_CLOCK 113

Maximum supported SM clock for the device.

DCGM_FI_DEV_MAX_MEM_CLOCK 114

Maximum supported Memory clock for the device.

DCGM_FI_DEV_MAX_VIDEO_CLOCK 115

Maximum supported Video encoder/decoder clock for the device.

DCGM_FI_DEV_AUTOBOOST 120

Auto-boost for the device (1 = enabled.

0 = disabled)

DCGM_FI_DEV_SUPPORTED_CLOCKS 130

Supported clocks for the device.

DCGM_FI_DEV_MEMORY_TEMP 140

Memory temperature for the device.

DCGM_FI_DEV_GPU_TEMP 150

Current temperature readings for the device, in degrees C.

DCGM_FI_DEV_MEM_MAX_OP_TEMP 151

Maximum operating temperature for the memory of this GPU.

DCGM_FI_DEV_GPU_MAX_OP_TEMP 152

Maximum operating temperature for this GPU.

DCGM_FI_DEV_POWER_USAGE 155

Power usage for the device in Watts.

DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION 156

Total energy consumption for the GPU in mJ since the driver was last reloaded.

DCGM_FI_DEV_POWER_USAGE_INSTANT 157

Current instantaneous power usage of the device in Watts.

DCGM_FI_DEV_SLOWDOWN_TEMP 158

Slowdown temperature for the device.

DCGM_FI_DEV_SHUTDOWN_TEMP 159

Shutdown temperature for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT 160

Current Power limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN 161

Minimum power management limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX 162

Maximum power management limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_DEF 163

Default power management limit for the device.

DCGM_FI_DEV_ENFORCED_POWER_LIMIT 164

Effective power limit that the driver enforces after taking into account all limiters.

DCGM_FI_DEV_PSTATE 190

Performance state (P-State) 0-15.

0=highest

DCGM_FI_DEV_FAN_SPEED 191

Fan speed for the device in percent 0-100.

DCGM_FI_DEV_PCIE_TX_THROUGHPUT 200

PCIe Tx utilization information.

Deprecated: Use DCGM_FI_PROF_PCIE_TX_BYTES instead.

DCGM_FI_DEV_PCIE_RX_THROUGHPUT 201

PCIe Rx utilization information.

Deprecated: Use DCGM_FI_PROF_PCIE_RX_BYTES instead.

DCGM_FI_DEV_PCIE_REPLAY_COUNTER 202

PCIe replay counter.

DCGM_FI_DEV_GPU_UTIL 203

GPU Utilization.

DCGM_FI_DEV_MEM_COPY_UTIL 204

Memory Utilization.

DCGM_FI_DEV_ACCOUNTING_DATA 205

Process accounting stats.

This field is only supported when the host engine is running as root unless you enable accounting ahead of time. Accounting mode can be enabled by running “nvidia-smi -am 1” as root on the same node the host engine is running on.

DCGM_FI_DEV_ENC_UTIL 206

Encoder Utilization.

DCGM_FI_DEV_DEC_UTIL 207

Decoder Utilization.

DCGM_FI_DEV_XID_ERRORS 230

XID errors.

The value is the specific XID error

PCIe Max Link Generation.

PCIe Max Link Width.

PCIe Current Link Generation.

PCIe Current Link Width.

DCGM_FI_DEV_POWER_VIOLATION 240

Power Violation time in usec.

DCGM_FI_DEV_THERMAL_VIOLATION 241

Thermal Violation time in usec.

DCGM_FI_DEV_SYNC_BOOST_VIOLATION 242

Sync Boost Violation time in usec.

DCGM_FI_DEV_BOARD_LIMIT_VIOLATION 243

Board violation limit.

DCGM_FI_DEV_LOW_UTIL_VIOLATION 244

Low utilisation violation limit.

DCGM_FI_DEV_RELIABILITY_VIOLATION 245

Reliability violation limit.

DCGM_FI_DEV_TOTAL_APP_CLOCKS_VIOLATION 246

App clock violation limit.

DCGM_FI_DEV_TOTAL_BASE_CLOCKS_VIOLATION 247

Base clock violation limit.

DCGM_FI_DEV_FB_TOTAL 250

Total Frame Buffer of the GPU in MB.

DCGM_FI_DEV_FB_FREE 251

Free Frame Buffer in MB.

DCGM_FI_DEV_FB_USED 252

Used Frame Buffer in MB.

DCGM_FI_DEV_FB_RESERVED 253

Reserved Frame Buffer in MB.

DCGM_FI_DEV_FB_USED_PERCENT 254

Percentage used of Frame Buffer: ‘Used/(Total - Reserved)’.

Range 0.0-1.0

C2C Link Count.

C2C Link Status The value of 0 the link is INACTIVE.

The value of 1 the link is ACTIVE.

DCGM_FI_DEV_C2C_MAX_BANDWIDTH 287

C2C Max Bandwidth The value indicates the link speed in MB/s.

DCGM_FI_DEV_ECC_CURRENT 300

Current ECC mode for the device.

DCGM_FI_DEV_ECC_PENDING 301

Pending ECC mode for the device.

DCGM_FI_DEV_ECC_SBE_VOL_TOTAL 310

Total single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_TOTAL 311

Total double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_AGG_TOTAL 312

Total single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_TOTAL 313

Total double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_VOL_L1 314

L1 cache single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_L1 315

L1 cache double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_L2 316

L2 cache single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_L2 317

L2 cache double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_DEV 318

Device memory single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_DEV 319

Device memory double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_REG 320

Register file single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_REG 321

Register file double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_TEX 322

Texture memory single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_TEX 323

Texture memory double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_AGG_L1 324

L1 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_L1 325

L1 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_L2 326

L2 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_L2 327

L2 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_DEV 328

Device memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_DEV 329

Device memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_REG 330

Register File single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_REG 331

Register File double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_TEX 332

Texture memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_TEX 333

Texture memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_MAX 385

Historical max available spare memory rows per memory bank.

DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_HIGH 386

Historical high mark of available spare memory rows per memory bank.

DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_PARTIAL 387

Historical mark of partial available spare memory rows per memory bank.

DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_LOW 388

Historical low mark of available spare memory rows per memory bank.

DCGM_FI_DEV_BANKS_REMAP_ROWS_AVAIL_NONE 389

Historical marker of memory banks with no available spare memory rows.

DCGM_FI_DEV_RETIRED_SBE 390

Number of retired pages because of single bit errors Note: monotonically increasing.

DCGM_FI_DEV_RETIRED_DBE 391

Number of retired pages because of double bit errors Note: monotonically increasing.

DCGM_FI_DEV_RETIRED_PENDING 392

Number of pages pending retirement.

DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS 393

Number of remapped rows for uncorrectable errors.

DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS 394

Number of remapped rows for correctable errors.

DCGM_FI_DEV_ROW_REMAP_FAILURE 395

Whether remapping of rows has failed.

DCGM_FI_DEV_ROW_REMAP_PENDING 396

Whether remapping of rows is pending.

DCGM_FI_DEV_VIRTUAL_MODE 500

Virtualization Mode corresponding to the GPU.

One of DCGM_GPU_VIRTUALIZATION_MODE_* constants.

DCGM_FI_DEV_SUPPORTED_TYPE_INFO 501

Includes Count and Static info of vGPU types supported on a device.

DCGM_FI_DEV_CREATABLE_VGPU_TYPE_IDS 502

Includes Count and currently Creatable vGPU types on a device.

DCGM_FI_DEV_VGPU_INSTANCE_IDS 503

Includes Count and currently Active vGPU Instances on a device.

DCGM_FI_DEV_VGPU_UTILIZATIONS 504

Utilization values for vGPUs running on the device.

DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION 505

Utilization values for processes running within vGPU VMs using the device.

DCGM_FI_DEV_ENC_STATS 506

Current encoder statistics for a given device.

DCGM_FI_DEV_FBC_STATS 507

Statistics of current active frame buffer capture sessions on a given device.

DCGM_FI_DEV_FBC_SESSIONS_INFO 508

Information about active frame buffer capture sessions on a target device.

DCGM_FI_DEV_SUPPORTED_VGPU_TYPE_IDS 509

Includes Count and currently Supported vGPU types on a device.

DCGM_FI_DEV_VGPU_TYPE_INFO 510

Includes Static info of vGPU types supported on a device.

DCGM_FI_DEV_VGPU_TYPE_NAME 511

Includes the name of a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_TYPE_CLASS 512

Includes the class of a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_TYPE_LICENSE 513

Includes the license info for a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_VM_ID 520

VM ID of the vGPU instance.

DCGM_FI_DEV_VGPU_VM_NAME 521

VM name of the vGPU instance.

DCGM_FI_DEV_VGPU_TYPE 522

vGPU type of the vGPU instance

DCGM_FI_DEV_VGPU_UUID 523

UUID of the vGPU instance.

DCGM_FI_DEV_VGPU_DRIVER_VERSION 524

Driver version of the vGPU instance.

DCGM_FI_DEV_VGPU_MEMORY_USAGE 525

Memory usage of the vGPU instance.

DCGM_FI_DEV_VGPU_LICENSE_STATUS 526

License status of the vGPU.

DCGM_FI_DEV_VGPU_FRAME_RATE_LIMIT 527

Frame rate limit of the vGPU instance.

DCGM_FI_DEV_VGPU_ENC_STATS 528

Current encoder statistics of the vGPU instance.

DCGM_FI_DEV_VGPU_ENC_SESSIONS_INFO 529

Information about all active encoder sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_FBC_STATS 530

Statistics of current active frame buffer capture sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_FBC_SESSIONS_INFO 531

Information about active frame buffer capture sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_INSTANCE_LICENSE_STATE 532

License state information of the vGPU instance.

DCGM_FI_DEV_VGPU_PCI_ID 533

PCI Id of the vGPU instance.

DCGM_FI_DEV_VGPU_VM_GPU_INSTANCE_ID 534

GPU Instance ID for the given vGPU Instance.

DCGM_FI_FIRST_VGPU_FIELD_ID 520

Starting field ID of the vGPU instance.

DCGM_FI_LAST_VGPU_FIELD_ID 570

Last field ID of the vGPU instance.

DCGM_FI_MAX_VGPU_FIELDS DCGM_FI_LAST_VGPU_FIELD_ID - DCGM_FI_FIRST_VGPU_FIELD_ID

For now max vGPU field Ids taken as difference of DCGM_FI_LAST_VGPU_FIELD_ID and DCGM_FI_LAST_VGPU_FIELD_ID i.e.

50

DCGM_FI_INTERNAL_FIELDS_0_START 600

Starting ID for all the internal fields.

DCGM_FI_INTERNAL_FIELDS_0_END 699

Last ID for all the internal fields.

NVSwitch entity field IDs start here.

NVSwitch latency bins for port 0

DCGM_FI_FIRST_NVSWITCH_FIELD_ID 700

Starting field ID of the NVSwitch instance.

DCGM_FI_DEV_NVSWITCH_VOLTAGE_MVOLT 701

NvSwitch voltage.

DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ 702

NvSwitch Current IDDQ.

DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ_REV 703

NvSwitch Current IDDQ Rev.

DCGM_FI_DEV_NVSWITCH_CURRENT_IDDQ_DVDD 704

NvSwitch Current IDDQ Rev DVDD.

DCGM_FI_DEV_NVSWITCH_POWER_VDD 705

NvSwitch Power VDD in watts.

DCGM_FI_DEV_NVSWITCH_POWER_DVDD 706

NvSwitch Power DVDD in watts.

DCGM_FI_DEV_NVSWITCH_POWER_HVDD 707

NvSwitch Power HVDD in watts.

NVSwitch Tx Throughput Counter for ports 0-17

NVSwitch Rx Throughput Counter for ports 0-17.

NvSwitch fatal_errors for ports 0-17.

NvSwitch non_fatal_errors for ports 0-17.

NvSwitch replay_count_errors for ports 0-17.

NvSwitch recovery_count_errors for ports 0-17.

NvSwitch filt_err_count_errors for ports 0-17.

NvLink lane_crs_err_count_aggregate_errors for ports 0-17.

NvLink lane ecc_err_count_aggregate_errors for ports 0-17.

Nvlink lane latency low lane0 counter.

Nvlink lane latency low lane1 counter.

Nvlink lane latency low lane2 counter.

Nvlink lane latency low lane3 counter.

Nvlink lane latency medium lane0 counter.

Nvlink lane latency medium lane1 counter.

Nvlink lane latency medium lane2 counter.

Nvlink lane latency medium lane3 counter.

Nvlink lane latency high lane0 counter.

Nvlink lane latency high lane1 counter.

Nvlink lane latency high lane2 counter.

Nvlink lane latency high lane3 counter.

Nvlink lane latency panic lane0 counter.

Nvlink lane latency panic lane1 counter.

Nvlink lane latency panic lane2 counter.

Nvlink lane latency panic lane2 counter.

Nvlink lane latency count lane0 counter.

Nvlink lane latency count lane1 counter.

Nvlink lane latency count lane2 counter.

Nvlink lane latency count lane3 counter.

NvLink lane crc_err_count for lane 0 on ports 0-17.

NvLink lane crc_err_count for lane 1 on ports 0-17.

NvLink lane crc_err_count for lane 2 on ports 0-17.

NvLink lane crc_err_count for lane 3 on ports 0-17.

NvLink lane ecc_err_count for lane 0 on ports 0-17.

NvLink lane ecc_err_count for lane 1 on ports 0-17.

NvLink lane ecc_err_count for lane 2 on ports 0-17.

NvLink lane ecc_err_count for lane 3 on ports 0-17.

DCGM_FI_DEV_NVSWITCH_FATAL_ERRORS 856

NVSwitch fatal error information.

Note: value field indicates the specific SXid reported

DCGM_FI_DEV_NVSWITCH_NON_FATAL_ERRORS 857

NVSwitch non fatal error information.

Note: value field indicates the specific SXid reported

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_CURRENT 858

NVSwitch current temperature.

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SLOWDOWN 859

NVSwitch limit slowdown temperature.

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SHUTDOWN 860

NVSwitch limit shutdown temperature.

DCGM_FI_DEV_NVSWITCH_THROUGHPUT_TX 861

NVSwitch throughput Tx.

DCGM_FI_DEV_NVSWITCH_THROUGHPUT_RX 862

NVSwitch throughput Rx.

DCGM_FI_DEV_NVSWITCH_PHYS_ID 863
DCGM_FI_DEV_NVSWITCH_RESET_REQUIRED 864

NVSwitch reset required.

NvSwitch NvLink ID.

DCGM_FI_DEV_NVSWITCH_PCIE_DOMAIN 866

NvSwitch PCIE domain.

DCGM_FI_DEV_NVSWITCH_PCIE_BUS 867

NvSwitch PCIE bus.

DCGM_FI_DEV_NVSWITCH_PCIE_DEVICE 868

NvSwitch PCIE device.

DCGM_FI_DEV_NVSWITCH_PCIE_FUNCTION 869

NvSwitch PCIE function.

NvLink status.

UNKNOWN:-1 OFF:0 SAFE:1 ACTIVE:2 ERROR:3

NvLink device type (GPU/Switch).

NvLink device pcie domain.

NvLink device pcie bus.

NvLink device pcie device.

NvLink device pcie function.

NvLink device link ID.

NvLink device SID.

NvLink device link uid.

DCGM_FI_LAST_NVSWITCH_FIELD_ID 899

Last field ID of the NVSwitch instance.

DCGM_FI_MAX_NVSWITCH_FIELDS DCGM_FI_LAST_NVSWITCH_FIELD_ID - DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1

For now max NVSwitch field Ids taken as difference of DCGM_FI_LAST_NVSWITCH_FIELD_ID and DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1 i.e.

200

DCGM_FI_PROF_GR_ENGINE_ACTIVE 1001

Profiling Fields.

These all start with DCGM_FI_PROF_* Ratio of time the graphics engine is active. The graphics engine is active if a graphics/compute context is bound and the graphics pipe or compute pipe is busy.

DCGM_FI_PROF_SM_ACTIVE 1002

The ratio of cycles an SM has at least 1 warp assigned (computed from the number of cycles and elapsed cycles)

DCGM_FI_PROF_SM_OCCUPANCY 1003

The ratio of number of warps resident on an SM.

(number of resident as a ratio of the theoretical maximum number of warps per elapsed cycle)

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE 1004

The ratio of cycles the any tensor pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_DRAM_ACTIVE 1005

The ratio of cycles the device memory interface is active sending or receiving data.

DCGM_FI_PROF_PIPE_FP64_ACTIVE 1006

Ratio of cycles the fp64 pipe is active.

DCGM_FI_PROF_PIPE_FP32_ACTIVE 1007

Ratio of cycles the fp32 pipe is active.

DCGM_FI_PROF_PIPE_FP16_ACTIVE 1008

Ratio of cycles the fp16 pipe is active.

This does not include HMMA.

DCGM_FI_PROF_PCIE_TX_BYTES 1009

The number of bytes of active PCIe tx (transmit) data including both header and payload.

Note that this is from the perspective of the GPU, so copying data from device to host (DtoH) would be reflected in this metric.

DCGM_FI_PROF_PCIE_RX_BYTES 1010

The number of bytes of active PCIe rx (read) data including both header and payload.

Note that this is from the perspective of the GPU, so copying data from host to device (HtoD) would be reflected in this metric.

The total number of bytes of active NvLink tx (transmit) data including both header and payload.

Per-link fields are available below

The total number of bytes of active NvLink rx (read) data including both header and payload.

Per-link fields are available below

DCGM_FI_PROF_PIPE_TENSOR_IMMA_ACTIVE 1013

The ratio of cycles the tensor (IMMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_TENSOR_HMMA_ACTIVE 1014

The ratio of cycles the tensor (HMMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_TENSOR_DFMA_ACTIVE 1015

The ratio of cycles the tensor (DFMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_INT_ACTIVE 1016

Ratio of cycles the integer pipe is active.

DCGM_FI_PROF_NVDEC0_ACTIVE 1017

Ratio of cycles each of the NVDEC engines are active.

DCGM_FI_PROF_NVDEC1_ACTIVE 1018
DCGM_FI_PROF_NVDEC2_ACTIVE 1019
DCGM_FI_PROF_NVDEC3_ACTIVE 1020
DCGM_FI_PROF_NVDEC4_ACTIVE 1021
DCGM_FI_PROF_NVDEC5_ACTIVE 1022
DCGM_FI_PROF_NVDEC6_ACTIVE 1023
DCGM_FI_PROF_NVDEC7_ACTIVE 1024
DCGM_FI_PROF_NVJPG0_ACTIVE 1025

Ratio of cycles each of the NVJPG engines are active.

DCGM_FI_PROF_NVJPG1_ACTIVE 1026
DCGM_FI_PROF_NVJPG2_ACTIVE 1027
DCGM_FI_PROF_NVJPG3_ACTIVE 1028
DCGM_FI_PROF_NVJPG4_ACTIVE 1029
DCGM_FI_PROF_NVJPG5_ACTIVE 1030
DCGM_FI_PROF_NVJPG6_ACTIVE 1031
DCGM_FI_PROF_NVJPG7_ACTIVE 1032
DCGM_FI_PROF_NVOFA0_ACTIVE 1033

Ratio of cycles each of the NVOFA engines are active.

The per-link number of bytes of active NvLink TX (transmit) or RX (transmit) data including both header and payload.

For example: DCGM_FI_PROF_NVLINK_L0_TX_BYTES -> L0 TX To get the bandwidth for a link, add the RX and TX value together like total = DCGM_FI_PROF_NVLINK_L0_TX_BYTES + DCGM_FI_PROF_NVLINK_L0_RX_BYTES

NVLink throughput First.

NVLink throughput Last.

DCGM_FI_DEV_CPU_UTIL_TOTAL 1100

CPU Utilization, total.

DCGM_FI_DEV_CPU_UTIL_USER 1101

CPU Utilization, user.

DCGM_FI_DEV_CPU_UTIL_NICE 1102

CPU Utilization, nice.

DCGM_FI_DEV_CPU_UTIL_SYS 1103

CPU Utilization, system time.

DCGM_FI_DEV_CPU_UTIL_IRQ 1104

CPU Utilization, interrupt servicing.

DCGM_FI_DEV_CPU_TEMP_CURRENT 1110

CPU temperature.

DCGM_FI_DEV_CPU_TEMP_WARNING 1111

CPU Warning Temperature.

DCGM_FI_DEV_CPU_TEMP_CRITICAL 1112

CPU Critical Temperature.

DCGM_FI_DEV_CPU_CLOCK_CURRENT 1120

CPU instantaneous clock speed.

DCGM_FI_DEV_CPU_POWER_UTIL_CURRENT 1130

CPU power utilization.

DCGM_FI_DEV_CPU_POWER_LIMIT 1131

CPU power limit.

DCGM_FI_DEV_CPU_VENDOR 1140

CPU vendor name.

DCGM_FI_DEV_CPU_MODEL 1141

CPU model name.

DCGM_FI_MAX_FIELDS 1142

1 greater than maximum fields above.

This is the 1 greater than the maximum field id that could be allocated

Functions

dcgm_field_meta_p DcgmFieldGetById(unsigned short fieldId)

Get a pointer to the metadata for a field by its field ID.

See DCGM_FI_? for a list of field IDs.

Parameters:

fieldId – IN: One of the field IDs (DCGM_FI_?)

Returns:

0 On Failure >0 Pointer to field metadata structure if found.

dcgm_field_meta_p DcgmFieldGetByTag(const char *tag)

Get a pointer to the metadata for a field by its field tag.

Parameters:

tag – IN: Tag for the field of interest

Returns:

0 On failure or not found >0 Pointer to field metadata structure if found

int DcgmFieldsInit(void)

Initialize the DcgmFields module.

Call this once from inside your program

Returns:

0 On success <0 On error

int DcgmFieldsTerm(void)

Terminates the DcgmFields module.

Call this once from inside your program

Returns:

0 On success <0 On error

const char *DcgmFieldsGetEntityGroupString(dcgm_field_entity_group_t entityGroupId)

Get the string version of a entityGroupId.

Returns:

  • Pointer to a string like GPU/NvSwitch..etc

  • Null on error