Open topic with navigation
NVIDIA System Profiler is a statistical sampling profiler with tracing features. It is designed to work with devices and devkits based on NVIDIA Tegra SoC (system-on-chip).
Throughout this document we will refer to the Tegra-based device on which profiling happens as target, and the computer on which the user works and controls the profiling session as host.
Furthermore, three different activities are distinguished as follows:
Profiling — The process of collecting any performance data. A profiling session in SP typically includes sampling and tracing.
Sampling — The process of periodically stopping the profilee (the application under investigation during the profiling session), typically involves collecting backtraces (call stacks of active threads), which allows you to understand statistically how much time is spent in each function. Additionally, hardware counters can also be sampled. This process is inherently imprecise when a low number of samples have been collected.
Tracing — The process of collecting precise information about various activities happening in the profilee or in the system. For example, a call of the profilee to a certain library function may be traced, in this case exact duration and timestamps will be recorded.
Since NVIDIA System Profiler supports multiple generations of Tegra, as well as various target operating systems, this documentation only describes the features available in the build of NVIDIA System Profiler it ships with.
Common features that are supported by NVIDIA System Profiler on most platforms include the following:
Sampling of the profilee, getting backtraces using multiple algorithms (such as frame pointers or DWARF data). Building top-down, bottom-up, and flat views. This information helps identify performance bottlenecks in CPU-intensive code.
Support for ARMv7 and ARMv8 processes.
Sampling or tracing power information, such as CPU frequency.
Sampling counters from ARM PMU (Performance Monitoring Unit). Information such as cache misses gets statistically correlated with function execution.
Support for multiple windows. Users with multiple monitors can see multiple reports simultaneously, or have multiple views into the same report file.
With NVIDIA System Profiler, a user could:
Identify call paths that monopolize the CPU.
Identify individual functions that monopolize the CPU (across different call paths).
Identify functions that have poor cache utilization.
See visual representation of CUDA Runtime and Driver API calls, as well as CUDA GPU workload.
See visual representation of NVTX annotations: ranges, markers, and thread names.
NVIDIA System Profiler 3.9
For older Tegra System Profiler documentation, see below:
Tegra System Profiler 3.7
Tegra System Profiler 3.6
Tegra System Profiler 3.1
Tegra System Profiler 2.8
Tegra System Profiler 2.7
Tegra System Profiler 2.6
Tegra System Profiler 2.5
Tegra System Profiler 2.4
Tegra System Profiler 2.3
Tegra System Profiler 2.2
NVIDIA® GameWorks™ Documentation Rev. 1.0.200601 ©2014-2020. NVIDIA Corporation. All Rights Reserved.