Operator Performance Benchmarks#
Algorithm Performance Tables#
The PVA operator implementation pages feature comprehensive performance tables that offer in-depth insights into the performance of each operator on the NVIDIA® Jetson AGX Orin™ platform. These tables include a variety of Key Performance Indicators (KPIs), such as mean execution time and mean submit latency. By examining these metrics, you can evaluate how the operator performs across different input sizes and configurations.
The KPI metrics are defined as follows:
- Execution Time
The average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to: - Process two streams simultaneously, or - Reduce execution time by approximately half by splitting the workload between the two cores.
- Submit Latency
The average time taken to submit the operator to the PVA. This submission operation is performed by the CPU.
- Total Power
The average total module power consumption of the PVA during the operator’s execution. This measurement is taken when both VPU cores are simultaneously running identical workloads. Power consumption is measured using the tegrastats utility [1] and computed as the sum of VDD_GPU_SOC, VDD_CPU_CV and VIN_SYS_5V0 power rails. Please note that, pre-regulator power is reported by tegrastats, which includes all regulator efficiency losses and module components power consumption. Idle power, which is measured when the PVA is not processing any data, is approximately 7W when the following settings are applied:
Clock Frequency and Power Settings#
To ensure consistent measurements across runs, the following device frequency and power settings are applied before benchmarking.
NVIDIA® Jetson AGX Orin™ Settings:
PVA/VPS freq.: 1.370 GHz
PVA/AXI freq.: 985.600 MHz
CPU: 12x ARMv8 Processor rev 1 (v8l) running at 2.202 GHz
EMC freq.: 3.1990 GHz
Power mode: MAXN
Fan speed: MAX