Field Identifiers

group dcgmFieldIdentifiers

Field Identifiers.

Defines

DCGM_FI_UNKNOWN

NULL field.

DCGM_FI_DRIVER_VERSION

Driver Version.

DCGM_FI_NVML_VERSION
DCGM_FI_PROCESS_NAME
DCGM_FI_DEV_COUNT

Number of Devices on the node.

DCGM_FI_CUDA_DRIVER_VERSION

Cuda Driver Version Retrieves a number with the major value in the thousands place and the minor value in the hundreds place.

CUDA 11.1 = 11100

DCGM_FI_DEV_NAME

Name of the GPU device.

DCGM_FI_DEV_BRAND

Device Brand.

DCGM_FI_DEV_NVML_INDEX

NVML index of this GPU.

DCGM_FI_DEV_SERIAL

Device Serial Number.

DCGM_FI_DEV_UUID

UUID corresponding to the device.

DCGM_FI_DEV_MINOR_NUMBER

Device node minor number /dev/nvidia#.

DCGM_FI_DEV_OEM_INFOROM_VER

OEM inforom version.

DCGM_FI_DEV_PCI_BUSID

PCI attributes for the device.

DCGM_FI_DEV_PCI_COMBINED_ID

The combined 16-bit device id and 16-bit vendor id.

DCGM_FI_DEV_PCI_SUBSYS_ID

The 32-bit Sub System Device ID.

DCGM_FI_GPU_TOPOLOGY_PCI

Topology of all GPUs on the system via PCI (static)

Topology of all GPUs on the system via NVLINK (static)

DCGM_FI_GPU_TOPOLOGY_AFFINITY

Affinity of all GPUs on the system (static)

DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY

Cuda compute capability for the device.

The major version is the upper 32 bits and the minor version is the lower 32 bits.

DCGM_FI_DEV_COMPUTE_MODE

Compute mode for the device.

DCGM_FI_DEV_PERSISTENCE_MODE

Persistence mode for the device Boolean: 0 is disabled, 1 is enabled.

DCGM_FI_DEV_MIG_MODE

MIG mode for the device Boolean: 0 is disabled, 1 is enabled.

DCGM_FI_DEV_CUDA_VISIBLE_DEVICES_STR

The string that CUDA_VISIBLE_DEVICES should be set to for this entity (including MIG)

DCGM_FI_DEV_MIG_MAX_SLICES

The maximum number of MIG slices supported by this GPU.

DCGM_FI_DEV_CPU_AFFINITY_0

Device CPU affinity.

part 1/8 = cpus 0 - 63

DCGM_FI_DEV_CPU_AFFINITY_1

Device CPU affinity.

part 1/8 = cpus 64 - 127

DCGM_FI_DEV_CPU_AFFINITY_2

Device CPU affinity.

part 2/8 = cpus 128 - 191

DCGM_FI_DEV_CPU_AFFINITY_3

Device CPU affinity.

part 3/8 = cpus 192 - 255

DCGM_FI_DEV_CC_MODE

ConfidentialCompute/AmpereProtectedMemory status for this system 0 = disabled 1 = enabled.

DCGM_FI_DEV_MIG_ATTRIBUTES

Attributes for the given MIG device handles.

DCGM_FI_DEV_MIG_GI_INFO

GPU instance profile information.

DCGM_FI_DEV_MIG_CI_INFO

Compute instance profile information.

DCGM_FI_DEV_ECC_INFOROM_VER

ECC inforom version.

DCGM_FI_DEV_POWER_INFOROM_VER

Power management object inforom version.

DCGM_FI_DEV_INFOROM_IMAGE_VER

Inforom image version.

DCGM_FI_DEV_INFOROM_CONFIG_CHECK

Inforom configuration checksum.

DCGM_FI_DEV_INFOROM_CONFIG_VALID

Reads the infoROM from the flash and verifies the checksums.

DCGM_FI_DEV_VBIOS_VERSION

VBIOS version of the device.

DCGM_FI_DEV_BAR1_TOTAL

Total BAR1 of the GPU in MB.

DCGM_FI_SYNC_BOOST

Deprecated - Sync boost settings on the node.

DCGM_FI_DEV_BAR1_USED

Used BAR1 of the GPU in MB.

DCGM_FI_DEV_BAR1_FREE

Free BAR1 of the GPU in MB.

DCGM_FI_DEV_SM_CLOCK

SM clock for the device.

DCGM_FI_DEV_MEM_CLOCK

Memory clock for the device.

DCGM_FI_DEV_VIDEO_CLOCK

Video encoder/decoder clock for the device.

DCGM_FI_DEV_APP_SM_CLOCK

SM Application clocks.

DCGM_FI_DEV_APP_MEM_CLOCK

Memory Application clocks.

DCGM_FI_DEV_CLOCK_THROTTLE_REASONS

Current clock throttle reasons (bitmask of DCGM_CLOCKS_THROTTLE_REASON_*)

DCGM_FI_DEV_MAX_SM_CLOCK

Maximum supported SM clock for the device.

DCGM_FI_DEV_MAX_MEM_CLOCK

Maximum supported Memory clock for the device.

DCGM_FI_DEV_MAX_VIDEO_CLOCK

Maximum supported Video encoder/decoder clock for the device.

DCGM_FI_DEV_AUTOBOOST

Auto-boost for the device (1 = enabled.

0 = disabled)

DCGM_FI_DEV_SUPPORTED_CLOCKS

Supported clocks for the device.

DCGM_FI_DEV_MEMORY_TEMP

Memory temperature for the device.

DCGM_FI_DEV_GPU_TEMP

Current temperature readings for the device, in degrees C.

DCGM_FI_DEV_MEM_MAX_OP_TEMP

Maximum operating temperature for the memory of this GPU.

DCGM_FI_DEV_GPU_MAX_OP_TEMP

Maximum operating temperature for this GPU.

DCGM_FI_DEV_POWER_USAGE

Power usage for the device in Watts.

DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION

Total energy consumption for the GPU in mJ since the driver was last reloaded.

DCGM_FI_DEV_SLOWDOWN_TEMP

Slowdown temperature for the device.

DCGM_FI_DEV_SHUTDOWN_TEMP

Shutdown temperature for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT

Current Power limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN

Minimum power management limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX

Maximum power management limit for the device.

DCGM_FI_DEV_POWER_MGMT_LIMIT_DEF

Default power management limit for the device.

DCGM_FI_DEV_ENFORCED_POWER_LIMIT

Effective power limit that the driver enforces after taking into account all limiters.

DCGM_FI_DEV_PSTATE

Performance state (P-State) 0-15.

0=highest

DCGM_FI_DEV_FAN_SPEED

Fan speed for the device in percent 0-100.

DCGM_FI_DEV_PCIE_TX_THROUGHPUT

PCIe Tx utilization information.

Deprecated: Use DCGM_FI_PROF_PCIE_TX_BYTES instead.

DCGM_FI_DEV_PCIE_RX_THROUGHPUT

PCIe Rx utilization information.

Deprecated: Use DCGM_FI_PROF_PCIE_RX_BYTES instead.

DCGM_FI_DEV_PCIE_REPLAY_COUNTER

PCIe replay counter.

DCGM_FI_DEV_GPU_UTIL

GPU Utilization.

DCGM_FI_DEV_MEM_COPY_UTIL

Memory Utilization.

DCGM_FI_DEV_ACCOUNTING_DATA

Process accounting stats.

This field is only supported when the host engine is running as root unless you enable accounting ahead of time. Accounting mode can be enabled by running “nvidia-smi -am 1” as root on the same node the host engine is running on.

DCGM_FI_DEV_ENC_UTIL

Encoder Utilization.

DCGM_FI_DEV_DEC_UTIL

Decoder Utilization.

DCGM_FI_DEV_XID_ERRORS

XID errors.

The value is the specific XID error

PCIe Max Link Generation.

PCIe Max Link Width.

PCIe Current Link Generation.

PCIe Current Link Width.

DCGM_FI_DEV_POWER_VIOLATION

Power Violation time in usec.

DCGM_FI_DEV_THERMAL_VIOLATION

Thermal Violation time in usec.

DCGM_FI_DEV_SYNC_BOOST_VIOLATION

Sync Boost Violation time in usec.

DCGM_FI_DEV_BOARD_LIMIT_VIOLATION

Board violation limit.

DCGM_FI_DEV_LOW_UTIL_VIOLATION

Low utilisation violation limit.

DCGM_FI_DEV_RELIABILITY_VIOLATION

Reliability violation limit.

DCGM_FI_DEV_TOTAL_APP_CLOCKS_VIOLATION

App clock violation limit.

DCGM_FI_DEV_TOTAL_BASE_CLOCKS_VIOLATION

Base clock violation limit.

DCGM_FI_DEV_FB_TOTAL

Total Frame Buffer of the GPU in MB.

DCGM_FI_DEV_FB_FREE

Free Frame Buffer in MB.

DCGM_FI_DEV_FB_USED

Used Frame Buffer in MB.

DCGM_FI_DEV_FB_RESERVED

Reserved Frame Buffer in MB.

DCGM_FI_DEV_FB_USED_PERCENT

Percentage used of Frame Buffer: ‘Used/(Total - Reserved)’.

Range 0.0-1.0

DCGM_FI_DEV_ECC_CURRENT

Current ECC mode for the device.

DCGM_FI_DEV_ECC_PENDING

Pending ECC mode for the device.

DCGM_FI_DEV_ECC_SBE_VOL_TOTAL

Total single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_TOTAL

Total double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_AGG_TOTAL

Total single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_TOTAL

Total double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_VOL_L1

L1 cache single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_L1

L1 cache double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_L2

L2 cache single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_L2

L2 cache double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_DEV

Device memory single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_DEV

Device memory double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_REG

Register file single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_REG

Register file double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_VOL_TEX

Texture memory single bit volatile ECC errors.

DCGM_FI_DEV_ECC_DBE_VOL_TEX

Texture memory double bit volatile ECC errors.

DCGM_FI_DEV_ECC_SBE_AGG_L1

L1 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_L1

L1 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_L2

L2 cache single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_L2

L2 cache double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_DEV

Device memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_DEV

Device memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_REG

Register File single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_REG

Register File double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_SBE_AGG_TEX

Texture memory single bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_ECC_DBE_AGG_TEX

Texture memory double bit aggregate (persistent) ECC errors Note: monotonically increasing.

DCGM_FI_DEV_RETIRED_SBE

Number of retired pages because of single bit errors Note: monotonically increasing.

DCGM_FI_DEV_RETIRED_DBE

Number of retired pages because of double bit errors Note: monotonically increasing.

DCGM_FI_DEV_RETIRED_PENDING

Number of pages pending retirement.

DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS

Number of remapped rows for uncorrectable errors.

DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS

Number of remapped rows for correctable errors.

DCGM_FI_DEV_ROW_REMAP_FAILURE

Whether remapping of rows has failed.

DCGM_FI_DEV_ROW_REMAP_PENDING

Whether remapping of rows is pending.

DCGM_FI_DEV_VIRTUAL_MODE

Virtualization Mode corresponding to the GPU.

One of DCGM_GPU_VIRTUALIZATION_MODE_* constants.

DCGM_FI_DEV_SUPPORTED_TYPE_INFO

Includes Count and Static info of vGPU types supported on a device.

DCGM_FI_DEV_CREATABLE_VGPU_TYPE_IDS

Includes Count and currently Creatable vGPU types on a device.

DCGM_FI_DEV_VGPU_INSTANCE_IDS

Includes Count and currently Active vGPU Instances on a device.

DCGM_FI_DEV_VGPU_UTILIZATIONS

Utilization values for vGPUs running on the device.

DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION

Utilization values for processes running within vGPU VMs using the device.

DCGM_FI_DEV_ENC_STATS

Current encoder statistics for a given device.

DCGM_FI_DEV_FBC_STATS

Statistics of current active frame buffer capture sessions on a given device.

DCGM_FI_DEV_FBC_SESSIONS_INFO

Information about active frame buffer capture sessions on a target device.

DCGM_FI_DEV_SUPPORTED_VGPU_TYPE_IDS

Includes Count and currently Supported vGPU types on a device.

DCGM_FI_DEV_VGPU_TYPE_INFO

Includes Static info of vGPU types supported on a device.

DCGM_FI_DEV_VGPU_TYPE_NAME

Includes the name of a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_TYPE_CLASS

Includes the class of a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_TYPE_LICENSE

Includes the license info for a vGPU type supported on a device.

DCGM_FI_DEV_VGPU_VM_ID

VM ID of the vGPU instance.

DCGM_FI_DEV_VGPU_VM_NAME

VM name of the vGPU instance.

DCGM_FI_DEV_VGPU_TYPE

vGPU type of the vGPU instance

DCGM_FI_DEV_VGPU_UUID

UUID of the vGPU instance.

DCGM_FI_DEV_VGPU_DRIVER_VERSION

Driver version of the vGPU instance.

DCGM_FI_DEV_VGPU_MEMORY_USAGE

Memory usage of the vGPU instance.

DCGM_FI_DEV_VGPU_LICENSE_STATUS

License status of the vGPU.

DCGM_FI_DEV_VGPU_FRAME_RATE_LIMIT

Frame rate limit of the vGPU instance.

DCGM_FI_DEV_VGPU_ENC_STATS

Current encoder statistics of the vGPU instance.

DCGM_FI_DEV_VGPU_ENC_SESSIONS_INFO

Information about all active encoder sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_FBC_STATS

Statistics of current active frame buffer capture sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_FBC_SESSIONS_INFO

Information about active frame buffer capture sessions on the vGPU instance.

DCGM_FI_DEV_VGPU_INSTANCE_LICENSE_STATE

License state information of the vGPU instance.

DCGM_FI_DEV_VGPU_PCI_ID

PCI Id of the vGPU instance.

DCGM_FI_DEV_VGPU_VM_GPU_INSTANCE_ID

GPU Instance ID for the given vGPU Instance.

DCGM_FI_FIRST_VGPU_FIELD_ID

Starting field ID of the vGPU instance.

DCGM_FI_LAST_VGPU_FIELD_ID

Last field ID of the vGPU instance.

DCGM_FI_MAX_VGPU_FIELDS

For now max vGPU field Ids taken as difference of DCGM_FI_LAST_VGPU_FIELD_ID and DCGM_FI_LAST_VGPU_FIELD_ID i.e.

50

DCGM_FI_INTERNAL_FIELDS_0_START

Starting ID for all the internal fields.

DCGM_FI_INTERNAL_FIELDS_0_END

Last ID for all the internal fields.

NVSwitch entity field IDs start here.

NVSwitch latency bins for port 0

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P00

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P00

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P00

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P00

Max latency bin.

NVSwitch latency bins for port 1

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P01

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P01

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P01

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P01

Max latency bin.

NVSwitch latency bins for port 2

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P02

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P02

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P02

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P02

Max latency bin.

NVSwitch latency bins for port 3

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P03

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P03

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P03

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P03

Max latency bin.

NVSwitch latency bins for port 4

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P04

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P04

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P04

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P04

Max latency bin.

NVSwitch latency bins for port 5

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P05

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P05

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P05

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P05

Max latency bin.

NVSwitch latency bins for port 6

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P06

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P06

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P06

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P06

Max latency bin.

NVSwitch latency bins for port 7

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P07

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P07

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P07

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P07

Max latency bin.

NVSwitch latency bins for port 8

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P08

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P08

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P08

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P08

Max latency bin.

NVSwitch latency bins for port 9

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P09

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P09

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P09

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P09

Max latency bin.

NVSwitch latency bins for port 10

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P10

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P10

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P10

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P10

Max latency bin.

NVSwitch latency bins for port 11

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P11

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P11

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P11

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P11

Max latency bin.

NVSwitch latency bins for port 12

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P12

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P12

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P12

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P12

Max latency bin.

NVSwitch latency bins for port 13

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P13

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P13

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P13

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P13

Max latency bin.

NVSwitch latency bins for port 14

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P14

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P14

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P14

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P14

Max latency bin.

NVSwitch latency bins for port 15

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P15

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P15

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P15

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P15

Max latency bin.

NVSwitch latency bins for port 16

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P16

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P16

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P16

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P16

Max latency bin.

NVSwitch latency bins for port 17

DCGM_FI_DEV_NVSWITCH_LATENCY_LOW_P17

Low latency bin

DCGM_FI_DEV_NVSWITCH_LATENCY_MED_P17

Medium latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_HIGH_P17

High latency bin.

DCGM_FI_DEV_NVSWITCH_LATENCY_MAX_P17

Max latency bin

NVSwitch Tx and Rx Counter 0 for each port

By default, Counter 0 counts bytes.

NVSwitch Tx Throughput Counter for ports 0-17

NVSwitch Rx Throughput Counter for ports 0-17.

NvSwitch fatal_errors for ports 0-17.

NvSwitch non_fatal_errors for ports 0-17.

NvSwitch replay_count_errors for ports 0-17.

NvSwitch recovery_count_errors for ports 0-17.

NvSwitch filt_err_count_errors for ports 0-17.

NvLink lane_crs_err_count_aggregate_errors for ports 0-17.

NvLink lane ecc_err_count_aggregate_errors for ports 0-17.

Nvlink lane latency low lane0 counter.

Nvlink lane latency low lane1 counter.

Nvlink lane latency low lane2 counter.

Nvlink lane latency low lane3 counter.

Nvlink lane latency medium lane0 counter.

Nvlink lane latency medium lane1 counter.

Nvlink lane latency medium lane2 counter.

Nvlink lane latency medium lane3 counter.

Nvlink lane latency high lane0 counter.

Nvlink lane latency high lane1 counter.

Nvlink lane latency high lane2 counter.

Nvlink lane latency high lane3 counter.

Nvlink lane latency panic lane0 counter.

Nvlink lane latency panic lane1 counter.

Nvlink lane latency panic lane2 counter.

Nvlink lane latency panic lane2 counter.

Nvlink lane latency count lane0 counter.

Nvlink lane latency count lane1 counter.

Nvlink lane latency count lane2 counter.

Nvlink lane latency count lane3 counter.

NvLink lane crc_err_count for lane 0 on ports 0-17.

NvLink lane crc_err_count for lane 1 on ports 0-17.

NvLink lane crc_err_count for lane 2 on ports 0-17.

NvLink lane crc_err_count for lane 3 on ports 0-17.

NvLink lane ecc_err_count for lane 0 on ports 0-17.

NvLink lane ecc_err_count for lane 1 on ports 0-17.

NvLink lane ecc_err_count for lane 2 on ports 0-17.

NvLink lane ecc_err_count for lane 3 on ports 0-17.

DCGM_FI_DEV_NVSWITCH_FATAL_ERRORS

NVSwitch fatal error information.

Note: value field indicates the specific SXid reported

DCGM_FI_DEV_NVSWITCH_NON_FATAL_ERRORS

NVSwitch non fatal error information.

Note: value field indicates the specific SXid reported

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_CURRENT

NVSwitch current temperature.

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SLOWDOWN

NVSwitch limit slowdown temperature.

DCGM_FI_DEV_NVSWITCH_TEMPERATURE_LIMIT_SHUTDOWN

NVSwitch limit shutdown temperature.

DCGM_FI_DEV_NVSWITCH_THROUGHPUT_TX

NVSwitch throughput Tx.

DCGM_FI_DEV_NVSWITCH_THROUGHPUT_RX

NVSwitch throughput Rx.

DCGM_FI_FIRST_NVSWITCH_FIELD_ID

Starting field ID of the NVSwitch instance.

DCGM_FI_LAST_NVSWITCH_FIELD_ID

Last field ID of the NVSwitch instance.

DCGM_FI_MAX_NVSWITCH_FIELDS

For now max NVSwitch field Ids taken as difference of DCGM_FI_LAST_NVSWITCH_FIELD_ID and DCGM_FI_FIRST_NVSWITCH_FIELD_ID + 1 i.e.

200

DCGM_FI_PROF_GR_ENGINE_ACTIVE

Profiling Fields.

These all start with DCGM_FI_PROF_* Ratio of time the graphics engine is active. The graphics engine is active if a graphics/compute context is bound and the graphics pipe or compute pipe is busy.

DCGM_FI_PROF_SM_ACTIVE

The ratio of cycles an SM has at least 1 warp assigned (computed from the number of cycles and elapsed cycles)

DCGM_FI_PROF_SM_OCCUPANCY

The ratio of number of warps resident on an SM.

(number of resident as a ratio of the theoretical maximum number of warps per elapsed cycle)

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE

The ratio of cycles the any tensor pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_DRAM_ACTIVE

The ratio of cycles the device memory interface is active sending or receiving data.

DCGM_FI_PROF_PIPE_FP64_ACTIVE

Ratio of cycles the fp64 pipe is active.

DCGM_FI_PROF_PIPE_FP32_ACTIVE

Ratio of cycles the fp32 pipe is active.

DCGM_FI_PROF_PIPE_FP16_ACTIVE

Ratio of cycles the fp16 pipe is active.

This does not include HMMA.

DCGM_FI_PROF_PCIE_TX_BYTES

The number of bytes of active PCIe tx (transmit) data including both header and payload.

Note that this is from the perspective of the GPU, so copying data from device to host (DtoH) would be reflected in this metric.

DCGM_FI_PROF_PCIE_RX_BYTES

The number of bytes of active PCIe rx (read) data including both header and payload.

Note that this is from the perspective of the GPU, so copying data from host to device (HtoD) would be reflected in this metric.

The total number of bytes of active NvLink tx (transmit) data including both header and payload.

Per-link fields are available below

The total number of bytes of active NvLink rx (read) data including both header and payload.

Per-link fields are available below

DCGM_FI_PROF_PIPE_TENSOR_IMMA_ACTIVE

The ratio of cycles the tensor (IMMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_TENSOR_HMMA_ACTIVE

The ratio of cycles the tensor (HMMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_TENSOR_DFMA_ACTIVE

The ratio of cycles the tensor (DFMA) pipe is active (off the peak sustained elapsed cycles)

DCGM_FI_PROF_PIPE_INT_ACTIVE

Ratio of cycles the integer pipe is active.

DCGM_FI_PROF_NVDEC0_ACTIVE

Ratio of cycles each of the NVDEC engines are active.

DCGM_FI_PROF_NVDEC1_ACTIVE
DCGM_FI_PROF_NVDEC2_ACTIVE
DCGM_FI_PROF_NVDEC3_ACTIVE
DCGM_FI_PROF_NVDEC4_ACTIVE
DCGM_FI_PROF_NVDEC5_ACTIVE
DCGM_FI_PROF_NVDEC6_ACTIVE
DCGM_FI_PROF_NVDEC7_ACTIVE
DCGM_FI_PROF_NVJPG0_ACTIVE

Ratio of cycles each of the NVJPG engines are active.

DCGM_FI_PROF_NVJPG1_ACTIVE
DCGM_FI_PROF_NVJPG2_ACTIVE
DCGM_FI_PROF_NVJPG3_ACTIVE
DCGM_FI_PROF_NVJPG4_ACTIVE
DCGM_FI_PROF_NVJPG5_ACTIVE
DCGM_FI_PROF_NVJPG6_ACTIVE
DCGM_FI_PROF_NVJPG7_ACTIVE
DCGM_FI_PROF_NVOFA0_ACTIVE

Ratio of cycles each of the NVOFA engines are active.

The per-link number of bytes of active NvLink TX (transmit) or RX (transmit) data including both header and payload.

For example: DCGM_FI_PROF_NVLINK_L0_TX_BYTES -> L0 TX To get the bandwidth for a link, add the RX and TX value together like total = DCGM_FI_PROF_NVLINK_L0_TX_BYTES + DCGM_FI_PROF_NVLINK_L0_RX_BYTES

NVLink throughput First.

NVLink throughput Last.

DCGM_FI_MAX_FIELDS

1 greater than maximum fields above.

This is the 1 greater than the maximum field id that could be allocated

Functions

dcgm_field_meta_p DcgmFieldGetById(unsigned short fieldId)

Get a pointer to the metadata for a field by its field ID.

See DCGM_FI_? for a list of field IDs.

Parameters

fieldId – IN: One of the field IDs (DCGM_FI_?)

Returns

0 On Failure >0 Pointer to field metadata structure if found.

dcgm_field_meta_p DcgmFieldGetByTag(const char *tag)

Get a pointer to the metadata for a field by its field tag.

Parameters

tag – IN: Tag for the field of interest

Returns

0 On failure or not found >0 Pointer to field metadata structure if found

int DcgmFieldsInit(void)

Initialize the DcgmFields module.

Call this once from inside your program

Returns

0 On success <0 On error

int DcgmFieldsTerm(void)

Terminates the DcgmFields module.

Call this once from inside your program

Returns

0 On success <0 On error

const char *DcgmFieldsGetEntityGroupString(dcgm_field_entity_group_t entityGroupId)

Get the string version of a entityGroupId.

Returns

  • Pointer to a string like GPU/NvSwitch..etc

  • Null on error