Performance Monitoring#
Performance monitoring is a key feature of Jetson devices. This topic describes the performance monitoring features of Jetson devices.
Uncore: Performance Monitoring Unit#
Several functional units are outside the cores. These units are collectively referred to as the uncore. Some of them report uncore performance events and event counters, which are not counted by the core performance counters of the core’s Performance Monitor Unit (PMU).
The NVIDIA Uncore Perfmon Extension to the ARM Performance Monitor Extension (also known as “Uncore Perfmon”) allows ARM software to access its performance counters. The Uncore Perfmon Extension is designed to resemble the standard ARM Performance Monitor Extension as much as possible.
You can download ARM PMU documentation from the Linux Kernel Archives.
The `perf
tool comes with Linux kernel and can be used to monitor the performance of the uncore PMUs.
Jetson AGX Orin has one uncore PMU, which is the SCF PMU.
List of all SCF uncore PMU events:
./perf list pmu | grep scf
scf_pmu/bus_access/ [Kernel PMU event]
scf_pmu/bus_access_normal/ [Kernel PMU event]
scf_pmu/bus_access_not_shared/ [Kernel PMU event]
scf_pmu/bus_access_periph/ [Kernel PMU event]
scf_pmu/bus_access_rd/ [Kernel PMU event]
scf_pmu/bus_access_shared/ [Kernel PMU event]
scf_pmu/bus_access_wr/ [Kernel PMU event]
scf_pmu/bus_cycles/ [Kernel PMU event]
scf_pmu/scf_cache/ [Kernel PMU event]
scf_pmu/scf_cache_allocate/ [Kernel PMU event]
scf_pmu/scf_cache_refill/ [Kernel PMU event]
scf_pmu/scf_cache_wb/ [Kernel PMU event]
Example of using perf
to measure SCF events:
./perf stat -a -e scf_pmu/bus_access/,scf_pmu/bus_cycles/,scf_pmu/bus_access_wr/,scf_pmu/bus_access_rd/ memtest
Performance counter stats for 'system wide':
1000923 scf_pmu/bus_access/
1417764472 scf_pmu/bus_cycles/
654228 scf_pmu/bus_access_wr/
339139 scf_pmu/bus_access_rd/
3.083200000 seconds time elapsed
Jetson Thor has the following uncore PMUs:
Unified Coherency Fabric (UCF) PMU: Monitors system-level cache events and DRAM traffic that flows through UCF.
Vision PMU: Monitors memory traffic from the multimedia IPs (VI, ISP, VIC, and PVA) in the SOC,
Display PMU: Monitors memory traffic from the display IP in the SOC.
High-speed I/O PMU: Monitors memory traffic from the high-speed I/O devices (PCIE, XUSB, MGBE, EQOS, and UFS) in the SOC.
UCF-GPU PMU: Monitors integrated GPU physical address traffic flowing through UCF.
The Jetson Thor uncore PMU driver is built as an external module and can be loaded as a kernel module using modprobe
.
sudo modprobe arm_cspmu_module
List of some of the UCF uncore PMU events:
./perf list pmu | grep ucf
[..trim..]
nvidia_ucf_pmu_0/slc_allocate/ [Kernel PMU event]
nvidia_ucf_pmu_0/slc_refill/ [Kernel PMU event]
nvidia_ucf_pmu_0/slc_access/ [Kernel PMU event]
nvidia_ucf_pmu_0/slc_wb/ [Kernel PMU event]
[..trim..]
Example of using perf
to measure UCF cycles
events:
# perf stat -a -e duration_time,nvidia_ucf_pmu_0/cycles/dd if=/dev/zero of=/dev/shm/node0 bs=1M count=4
4+0 records in
4+0 records out
4194304 bytes (4.0MB) copied, 0.014309 seconds, 279.5MB/s
Performance counter stats for 'system wide':
2011246 nvidia_ucf_pmu_0/cycles/
0.019172990 seconds time elapsed