image image image image image

On This Page

The performance modules in NVIDIA® BlueField® are present in several hardware blocks and each block has a certain set of supported events. 

The mlx_pmc driver provides access to all of these performance modules through a sysfs interface. The driver creates a directory under /sys/class/hwmon under which each of the blocks explained above has a subdirectory. Please note that all directories under /sys/class/hwmon are named as "hwmon<N>" where N is the hwmon device number corresponding to the device. This is assigned by Linux and could change with the addition of more devices to the hwmon class. Each hwmon directory has a "name" node which can be used to identify the correct device. In this case, reading the "name" file should return "bfperf".

The hardware blocks that include performance modules are:

  • Tile (block containing 2 cores and a shared L2 cache) has 2 sets of counters, one set for HNF and HNF_NET events. These are present as "tile" and "tilenet" directories in the sysfs interface of the driver.
  • TRIO (PCIe root complex) has 3 sets of counters, one each for TRIO, SMGEN and PCIE TLR events. The sysfs directories for these are called "trio", "triogen" and "pcie" respectively.
  • MSS (memory sub-system containing the memory controller and L3 cache)
  • GIC and SMMU with one set of counters each for the SMGEN events. These are simply labelled "gic" and "smmu" respectively.

The number of Tile, TRIO and MSS blocks depends on the system. There is a maximum of 8 Tile, 3 TRIO and 2 MSS blocks in BlueField, and this is added as a suffix to the sysfs directory names. For example, this is a list of directories present in a BlueField-2 system:

ubuntu@dpu:/$ ls /sys/class/hwmon/hwmon0/
device l3cachehalf0 pcie0 smmu0 tile1 tilenet0 tilenet3 triogen0
ecc l3cachehalf1 pcie1 subsystem tile2 tilenet1 trio0 triogen1
gic0 name power  tile0 tile3 tilenet2 trio1 uevent

The PCIe TLR statistics for each TRIO are under the "pcie" block.

Performance Data Collection Mechanisms

The performance data of the BlueField hardware is collected using two mechanisms:

  1. Programming hardware counters to monitor specific events
  2. Reading registers that hold performance/event statistics

All blocks except "ecc" and "pcie" use the mechanism 1.

Using Hardware Counters

For blocks that use hardware counters to collect data, each counter present in the block is represented by "event<N>" and "counter<N>" sysfs files.

For example: 

ubuntu@dpu:/$ ls /sys/class/hwmon/hwmon0/tile0/
counter0 counter1 counter2 counter3 event0 event1 event2 event3 event_list

An event<N> and counter<N> pair can be used to program and monitor events. The "event_list" sysfs file displays the list of events supported by that block along with the hexadecimal value corresponding to each event.

Use the echo command to write the event number to the event<N> file, and use the cat command to read the counter value from the corresponding counter (counter<N>).

The counters are enabled individually once the event number is written to the corresponding event file. However, the L3 cache performance counters cannot be enabled or disabled individually and can only be triggered or stopped all at the same time.

So in the example provided, all 4 event files may be programmed with the necessary event numbers and then the "enable" file may be used to start the counters. Writing 0 to the enable file stops the counters while 1 starts them.

Reading Registers

For "ecc" and "pcie" blocks, the counters cannot be started or stopped by the user, instead the statistics are automatically collected by HW and stored in registers. These register names are exposed within the directory and can be read by the user at any time.

List of Supported Events

SMGEN Performance Module

Hex ValueNameDescription

0x0

AW_REQ

Reserved for internal use

0x1

AW_BEATS

Reserved for internal use

0x2

AW_TRANS

Reserved for internal use

0x3

AW_RESP

Reserved for internal use

0x4

AW_STL

Reserved for internal use

0x5

AW_LAT

Reserved for internal use

0x6

AW_REQ_TBU

Reserved for internal use

0x8

AR_REQ

Reserved for internal use

0x9

AR_BEATS

Reserved for internal use

0xa

AR_TRANS

Reserved for internal use

0xb

AR_STL

Reserved for internal use

0xc

AR_LAT

Reserved for internal use

0xd

AR_REQ_TBU

Reserved for internal use

0xe

TBU_MISS

The number of TBU miss

0xf

TX_DAT_AF

Mesh Data channel write FIFO almost Full.
This is from the TRIO toward the Arm memory.

0x10

RX_DAT_AF

Mesh Data channel read FIFO almost Full.
This is from the Arm memory toward the TRIO.

0x11

RETRYQ_CRED

Reserved for internal use

Tile HNF Performance Module

Hex ValueNameDescription

0x45

HNF_REQUESTS

Number of REQs that were processed in HNF

0x46

HNF_REJECTS

Reserved for internal use

0x47

ALL_BUSY

Reserved for internal use

0x48

MAF_BUSY

Reserved for internal use

0x49

MAF_REQUESTS

Reserved for internal use

0x4a

RNF_REQUESTS

Number of REQs sent by the RN-F selected by HNF_PERF_CTL register RNF_SEL field

0x4b

REQUEST_TYPE

Reserved for internal use

0x4c

MEMORY_READS

Number of reads to MSS

0x4d

MEMORY_WRITES

Number of writes to MSS

0x4e

VICTIM_WRITE

Number of victim lines written to memory

0x4f

POC_FULL

Reserved for internal use

0x50

POC_FAIL

Number of times that the POC Monitor sent RespErr Okay status to an Exclusive WriteNoSnp or CleanUnique REQ

0x51

POC_SUCCESS

Number of times that the POC Monitor sent RespErr ExOkay status to an Exclusive WriteNoSnp or CleanUnique REQ

0x52

POC_WRITES

Number of Exclusive WriteNoSnp or CleanUnique REQs processed by POC Monitor

0x53

POC_READS

Number of Exclusive ReadClean/ReadShared REQs processed by POC Monitor

0x54

FORWARD

Reserved for internal use

0x55

RXREQ_HNF

Reserved for internal use

0x56

RXRSP_HNF

Reserved for internal use

0x57

RXDAT_HNF

Reserved for internal use

0x58

TXREQ_HNF

Reserved for internal use

0x59

TXRSP_HNF

Reserved for internal use

0x5a

TXDAT_HNF

Reserved for internal use

0x5b

TXSNP_HNF

Reserved for internal use

0x5c

INDEX_MATCH

Reserved for internal use

0x5d

A72_ACCESS

Access requests (Reads, Writes, CopyBack, CMO, DVM) from A72 clusters

0x5e

IO_ACCESS

Accesses requests (Reads, Writes) from DMA IO devices

0x5f

TSO_WRITE

Total Store Order write Requests from DMA IO devices

0x60

TSO_CONFLICT

Reserved for internal use

0x61

DIR_HIT

Requests that hit in directory

0x62

HNF_ACCEPTS

Reserved for internal use

0x63

REQ_BUF_EMPTY

Number of cycles when request buffer is empty

0x64

REQ_BUF_IDLE_MAF

Reserved for internal use

0x65

TSO_NOARB

Reserved for internal use

0x66

TSO_NOARB_CYCLES

Reserved for internal use

0x67

MSS_NO_CREDIT

Number of cycles that a Request could not be sent to MSS due to lack of credits

0x68

TXDAT_NO_LCRD

Reserved for internal use

0x69

TXSNP_NO_LCRD

Reserved for internal use

0x6a

TXRSP_NO_LCRD

Reserved for internal use

0x6b

TXREQ_NO_LCRD

Reserved for internal use

0x6c

TSO_CL_MATCH

Reserved for internal use

0x6d

MEMORY_READS_BYPASS

Number of reads to MSS that bypass Home Node

0x6e

TSO_NOARB_TIMEOUT

Reserved for internal use

0x6f

ALLOCATE

Number of times that Directory entry was allocated

0x70

VICTIM

Number of times that Directory entry allocation did not find an Invalid way in the set

0x71

A72_WRITE

Write requests from A72 clusters

0x72

A72_Read

Read requests from A72 clusters

0x73

IO_WRITE

Write requests from DMA IO devices

0x74

IO_Reads

Read requests from DMA IO devices

0x75

TSO_Reject

Reserved for internal use

0x80

TXREQ_RN

Reserved for internal use

0x81

TXRSP_RN

Reserved for internal use

0x82

TXDAT_RN

Reserved for internal use

0x83

RXSNP_RN

Reserved for internal use

0x84

RXRSP_RN

Reserved for internal use

0x85

RXDAT_RN

Reserved for internal use

TRIO Performance Module

Hex ValueNameDescription

0xa0

TPIO_DATA_BEAT

Data beats from Arm PIO to TRIO

0xa1

TDMA_DATA_BEAT

Data beats from Arm memory to PCI completion

0xa2

MAP_DATA_BEAT

Reserved for internal use

0xa3

TXMSG_DATA_BEAT

Reserved for internal use

0xa4

TPIO_DATA_PACKET

Data packets from Arm PIO to TRIO

0xa5

TDMA_DATA_PACKET

Data packets from Arm memory to PCI completion

0xa6

MAP_DATA_PACKET

Reserved for internal use

0xa7

TXMSG_DATA_PACKET

Reserved for internal use

0xa8

TDMA_RT_AF

The in-flight PCI DMA READ request queue is almost full

0xa9

TDMA_PBUF_MAC_AF

Indicator of the buffer of Arm memory reads is too full awaiting PCIe access

0xaa

TRIO_MAP_WRQ_BUF_EMPTY

PCIe write transaction buffer is empty

0xab

TRIO_MAP_CPL_BUF_EMPTY

Arm PIO request completion queue is empty

0xac

TRIO_MAP_RDQ0_BUF_EMPTY

The buffer of MAC0's read transaction is empty

0xad

TRIO_MAP_RDQ1_BUF_EMPTY

The buffer of MAC1's read transaction is empty

0xae

TRIO_MAP_RDQ2_BUF_EMPTY

The buffer of MAC2's read transaction is empty

0xaf

TRIO_MAP_RDQ3_BUF_EMPTY

The buffer of MAC3's read transaction is empty

0xb0

TRIO_MAP_RDQ4_BUF_EMPTY

The buffer of MAC4's read transaction is empty

0xb1

TRIO_MAP_RDQ5_BUF_EMPTY

The buffer of MAC5's read transaction is empty

0xb2

TRIO_MAP_RDQ6_BUF_EMPTY

The buffer of MAC6's read transaction is empty

0xb3

TRIO_MAP_RDQ7_BUF_EMPTY

The buffer of MAC7's read transaction is empty

L3 Cache Performance Module

The L3 cache interfaces with the Arm cores via the SkyMesh. The CDN is used for control data. The NDN is used for responses. The DDN is for the actual data transfer.

Hex ValueNameDescription

0x00

DISABLE

Reserved for internal use

0x01

CYCLES

Timestamp counter

0x02

TOTAL_RD_REQ_IN

Read Transaction control request from the CDN of the SkyMesh

0x03

TOTAL_WR_REQ_IN

Write transaction control request from the CDN of the SkyMesh

0x04TOTAL_WR_DBID_ACKWrite transaction control responses from the NDN of the SkyMesh
0x05TOTAL_WR_DATA_INWrite transaction data from the DDN of the SkyMesh
0x06TOTAL_WR_COMPWrite completion response from the NDN of the SkyMesh
0x07TOTAL_RD_DATA_OUTRead transaction data from the DDN
0x08TOTAL_CDN_REQ_IN_BANK0CHI CDN Transactions Bank 0
0x09TOTAL_CDN_REQ_IN_BANK1CHI CDN Transactions Bank 1
0x0aTOTAL_DDN_REQ_IN_BANK0CHI DDN Transactions Bank 0
0x0bTOTAL_DDN_REQ_IN_BANK1CHI DDN Transactions Bank 1
0x0cTOTAL_EMEM_RD_RES_IN_BANK0Total EMEM Read Response Bank 0
0x0dTOTAL_EMEM_RD_RES_IN_BANK1Total EMEM Read Response Bank 1
0x0eTOTAL_CACHE_RD_RES_IN_BANK0Total Cache Read Response Bank 0
0x0fTOTAL_CACHE_RD_RES_IN_BANK1Total Cache Read Response Bank 1
0x10TOTAL_EMEM_RD_REQ_BANK0Total EMEM Read Request Bank 0
0x11TOTAL_EMEM_RD_REQ_BANK1Total EMEM Read Request Bank 1
0x12TOTAL_EMEM_WR_REQ_BANK0Total EMEM Write Request Bank 0
0x13TOTAL_EMEM_WR_REQ_BANK1Total EMEM Write Request Bank 1
0x14TOTAL_RD_REQ_OUTEMEM Read Transactions Out
0x15TOTAL_WR_REQ_OUTEMEM Write Transactions Out
0x16TOTAL_RD_RES_INEMEM Read Transactions In
0x17HITS_BANK0Number of Hits Bank 0
0x18HITS_BANK1Number of Hits Bank 1
0x19MISSES_BANK0Number of Misses Bank 0
0x1aMISSES_BANK1Number of Misses Bank 1
0x1bALLOCATIONS_BANK0Number of Allocations Bank 0
0x1cALLOCATIONS_BANK1Number of Allocations Bank 1
0x1dEVICTIONS_BANK0Number of Evictions Bank 0
0x1eEVICTIONS_BANK1Number of Evictions Bank 1
0x1fDBID_REJECTReserved for internal use
0x20WRDB_REJECT_BANK0Reserved for internal use
0x21WRDB_REJECT_BANK1Reserved for internal use
0x22CMDQ_REJECT_BANK0Reserved for internal use
0x23CMDQ_REJECT_BANK1Reserved for internal use
0x24COB_REJECT_BANK0Reserved for internal use
0x25COB_REJECT_BANK1Reserved for internal use
0x26TRB_REJECT_BANK0Reserved for internal use
0x27TRB_REJECT_BANK1Reserved for internal use
0x28TAG_REJECT_BANK0Reserved for internal use
0x29TAG_REJECT_BANK1Reserved for internal use
0x2aANY_REJECT_BANK0Reserved for internal use
0x2bANY_REJECT_BANK1Reserved for internal use

PCIe TLR Statistics

Hex ValueNameDescription

0x0

PCIE_TLR_IN_P_PKT_CNT

Incoming posted packets

0x10

PCIE_TLR_IN_NP_PKT_CNT

Incoming non-posted packets

0x18

PCIE_TLR_IN_C_PKT_CNT

Incoming completion packets

0x20

PCIE_TLR_OUT_P_PKT_CNT

Outgoing posted packets

0x28

PCIE_TLR_OUT_NP_PKT_CNT

Outgoing non-posted packets

0x30

PCIE_TLR_OUT_C_PKT_CNT

Outgoing completion packets

0x38

PCIE_TLR_IN_P_BYTE_CNT

Incoming posted bytes

0x40

PCIE_TLR_IN_NP_BYTE_CNT

Incoming non-posted bytes

0x48

PCIE_TLR_IN_C_BYTE_CNT

Incoming completion bytes

0x50

PCIE_TLR_OUT_C_BYTE_CNT

Outgoing posted bytes

0x58

PCIE_TLR_OUT_NP_BYTE_CNT

Outgoing non-posted bytes

0x60

PCIE_TLR_OUT_C_BYTE_CNT

Outgoing completion bytes

Tile HNFNET Performance Module

Hex ValueNameDescription

0x12

CDN_REQThe number of CDN requests

0x13

DDN_REQThe number of DDN requests
0x14NDN_REQThe number of NDN requests
0x15

CDN_DIAG_N_OUT_OF_CRED

Number of cycles that north input port FIFO runs out of credits in the CDN network
0x16CDN_DIAG_S_OUT_OF_CREDNumber of cycles that south input port FIFO runs out of credits in the CDN network
0x17CDN_DIAG_E_OUT_OF_CREDNumber of cycles that east input port FIFO runs out of credits in the CDN network
0x18CDN_DIAG_W_OUT_OF_CREDNumber of cycles that west input port FIFO runs out of credits in the CDN network
0x19CDN_DIAG_C_OUT_OF_CREDNumber of cycles that core input port FIFO runs out of credits in the CDN network
0x1aCDN_DIAG_N_EGRESSPackets sent out from north port in the CDN network
0x1bCDN_DIAG_S_EGRESSPackets sent out from south port in the CDN network
0x1cCDN_DIAG_E_EGRESSPackets sent out from east port in the CDN network
0x1dCDN_DIAG_W_EGRESSPackets sent out from west port in the CDN network
0x1eCDN_DIAG_C_EGRESSPackets sent out from core port in the CDN network
0x1fCDN_DIAG_N_INGRESSPackets received by north port in the CDN network
0x20CDN_DIAG_S_INGRESSPackets received by south port in the CDN network
0x21CDN_DIAG_E_INGRESSPackets received by east port in the CDN network
0x22CDN_DIAG_W_INGRESSPackets received by west port in the CDN network
0x23CDN_DIAG_C_INGRESSPackets received by core port in the CDN network
0x24CDN_DIAG_CORE_SENTPackets completed from core port in the CDN network
0x25DDN_DIAG_N_OUT_OF_CREDNumber of cycles that north input port FIFO runs out of credits in the DDN network
0x26DDN_DIAG_S_OUT_OF_CREDNumber of cycles that south input port FIFO runs out of credits in the DDN network
0x27DDN_DIAG_E_OUT_OF_CREDNumber of cycles that east input port FIFO runs out of credits in the DDN network
0x28DDN_DIAG_W_OUT_OF_CREDNumber of cycles that west input port FIFO runs out of credits in the DDN network
0x29DDN_DIAG_C_OUT_OF_CREDNumber of cycles that core input port FIFO runs out of credits in the DDN network
0x2aDDN_DIAG_N_EGRESSPackets sent out from north port in the DDN network
0x2bDDN_DIAG_S_EGRESSPackets sent out from south port in the DDN network
0x2cDDN_DIAG_E_EGRESSPackets sent out from east port in the DDN network
0x2dDDN_DIAG_W_EGRESSPackets sent out from west port in the DDN network
0x2eDDN_DIAG_C_EGRESSPackets sent out from core port in the DDN network
0x2fDDN_DIAG_N_INGRESSPackets received by north port in the DDN network
0x30DDN_DIAG_S_INGRESSPackets received by south port in the DDN network
0x31DDN_DIAG_E_INGRESSPackets received by east port in the DDN network
0x32DDN_DIAG_W_INGRESSPackets received by west port in the DDN network
0x33DDN_DIAG_C_INGRESSPackets received by core port in the DDN network
0x34DDN_DIAG_CORE_SENTPackets completed from core port in the DDN network
0x35NDN_DIAG_N_OUT_OF_CREDNumber of cycles that north input port FIFO runs out of credits in the NDN network
0x36NDN_DIAG_S_OUT_OF_CREDNumber of cycles that south input port FIFO runs out of credits in the NDN network
0x37NDN_DIAG_E_OUT_OF_CREDNumber of cycles that east input port FIFO runs out of credits in the NDN network
0x38NDN_DIAG_W_OUT_OF_CREDNumber of cycles that west input port FIFO runs out of credits in the NDN network
0x39NDN_DIAG_C_OUT_OF_CREDNumber of cycles that core input port FIFO runs out of credits in the NDN network
0x3aNDN_DIAG_N_EGRESSPackets sent out from north port in the NDN network
0x3bNDN_DIAG_S_EGRESSPackets sent out from south port in the NDN network
0x3cNDN_DIAG_E_EGRESSPackets sent out from east port in the NDN network
0x3dNDN_DIAG_W_EGRESSPackets sent out from west port in the NDN network
0x3eNDN_DIAG_C_EGRESSPackets sent out from core port in the NDN network
0x3fNDN_DIAG_N_INGRESSPackets received by north port in the NDN network
0x40NDN_DIAG_S_INGRESSPackets received by south port in the NDN network
0x41NDN_DIAG_E_INGRESSPackets received by east port in the NDN network
0x42NDN_DIAG_W_INGRESSPackets received by west port in the NDN network
0x43NDN_DIAG_C_INGRESSPackets received by core port in the NDN network
0x44NDN_DIAG_CORE_SENTPackets completed from core port in the NDN network

Programming Counter to Monitor Events

To program a counter to monitor one of the events from the event list, the event name or number needs to be written to the corresponding event file.

Let us call the /sys/class/hwmon/hwmon<N> folder corresponding to this driver as BFPERF_DIR.

For example, to monitor the event HNF_REQUESTS (0x45) on tile2 using counter 3:

$ echo 0x45 > <BFPERF_DIR>/tile2/event3

Or:

$ echo HNF_REQUESTS > <BFPERF_DIR>/tile2/event3

Once this is done, counter3 resets the counter and starts monitoring the number of HNF_REQUESTS.

To read the counter value, run:

$ cat <BFPERF_DIR>/tile2/counter3

To see what event is currently being monitored by a counter, just read the corresponding event file to get the event name and number.

$ cat <BFPERF_DIR>/tile2/event3

In this case, reading the event3 file returns "0x45: HNF_REQUESTS".

To clear the counter, write 0 to the counter file.

$ echo 0 > <BFPERF_DIR>/tile2/counter3

This resets the accumulator and the counter continues monitoring the same event that has previously been programmed, but starts the count from 0 again. Writing non-zero values to the counter files is not allowed.

To stop monitoring an event, write 0xff to the corresponding event file.

This is slightly different for the l3cache blocks due to the restriction that all counters can only be enabled, disabled, or reset together. So once the event is written to the event file, the counters will have to be enabled to start monitoring their respective events by writing "1" to the "enable" file. Writing "0" to this file will stop all the counters. The most reliable way to get accurate counter values would be by disabling the counters after a certain time period and then proceeding to read the counter values. 

Programming a counter to monitor a new event automatically stops all the counters. Also, enabling the counters resets the counters to 0 first.

For blocks that have performance statistics registers (mechanism 2), all of these statistics are directly made available to be read or reset.

For example, to read the number of incoming posted packets to TRIO2:

$ cat <BFPERF_DIR>/pcie2/IN_P_PKT_CNT

The count can be reset to 0 by writing 0 to the same file. Again, non-zero writes to these files are not allowed.