Debugging on Jetson Platforms
NVIDIA® Jetson™ devices support debugging tools that allow Jetson application developers to place the processor into known states and trace its behavior while running. The Jetson architecture’s debugging support provides:
• Reduced power leakage
• Enhanced security
• Availability of standard interfaces
This topic describes the debuggable blocks and their debugging strategies. Use this information to help determine why something may not be working in the software you have developed using NVIDIA Jetson Board Support Package (BSP).
This topic also describes the software implementation of the hardware and software features in the Jetson processor family of Technical Reference Manuals (TRMs). Use the appropriate TRM as your primary source for information and debugging:
Debugging Improvements
The following table describes improvements in debugging features for Jetson devices.
Hardware Feature Benefit | Jetson Nano devices & Jetson TX1 | Jetson TX2 series | Jetson Xavier NX & Jetson AGX Xavier series |
---|
Hardware interface to debugger. | | | |
---|
JTAG (4-pin connector) | X | X | X |
---|
SWD (2-pin connector) | — | — | X |
---|
Debug interface connected to CPU via Debug Communication Channel with Memory Access Mode in v8. Debugger downloads & uploads code faster. | X | X | X |
---|
Debug connection to AXI-AP via JTAG or SWD). System access when CPUs are unavailable (powered down, dead, in reset, etc.). | X | X | X |
---|
AXI-AP 34-bit address can access MMIO & DRAM with requiring SMMU. | X | X | X |
---|
Connection to SNIC allows access to entire system. | | X | X |
---|
Debugger accesses to memory are coherent. | | | X |
---|
CoreSight support via JTAG or SWD. | | | |
---|
Connection to APE. | X | X | X |
---|
Connection to BPMP, SPE, & SCE. | — | X | X |
---|
Connection to RCE, PVA0, & PVA1. | — | — | X |
---|
Trace Storage circular buffer. Larger buffer yields a longer duration trace. Buffer is preserved through WDT resets. | 16 KiB | 32 KiB | 32 KiB |
---|
CoreSight Trace Sinks ETF and ETR
The following table describes Arm® CoreSight™ trace sink characteristics for Jetson. These characteristics include the corresponding Embedded Trace FIFO (ETF), ETR, and USB limits.
Characteristic | ETF (32 KiB) | DDR via ETR DMA | TPIU (Jetson Xavier NX & Jetson AGX Xavier only) | USB |
---|
Throughput | 41.58 Gbps @ 408 MHz, 128‑bit * | 41.58 Gbps | 800 Mbps | Real time processor tracing requires reduction of CPU frequency. |
Intrusive | No | Yes | No | Yes |
Available on commercial devices | Yes | Yes | Yes | Yes |
Use Cases | Collect trace for watchdog reset; code optimization for the CCPLEX. | Collect trace for watchdog reset; code optimization for the CCPLEX. † | Collects trace for watchdog reset; code optimization for the CCPLEX. Tracing is limited to bandwidth speed. | Single-CPU trace at low frequency, or APE-only trace to avoid DRAM bandwidth saturation. Tracing is limited to USB speeds. |
* Contact NVIDIA for higher frequency requirements. † Note the high bandwidth requirement at DDR = 25%. |
CoreSight AMBA Trace ID (ATID) Mapping
The following tables describe mapping for CoreSight AMBA® Trace ID (ATID). When collecting trace from multiple sources, the trace sinks (ETF and ETR) use ATIDs to segregate trace data.
Jetson Nano Devices, Jetson TX2 Series, and Jetson TX1
BCCPLEX (also called Fast Cluster or Big Cluster) using A57 processors |
ATID | Processor | Protocol |
0x40 | CPU0 | ETMv4 |
0x41 | CPU1 | ETMv4 |
0x42 | CPU2 | ETMv4 |
0x43 | CPU3 | ETMv4 |
APE, Cortex® A9 |
ATID | Processor | Protocol |
0x20 | CPU0 | PFT1.0 |
STM |
ATID | Processor | Protocol |
0x10 | NA | MIPI STP |
Jetson Xavier NX Series and Jetson AGX Xavier Series
CCPLEX using NVIDIA processors |
ATID | Processor | Protocol |
N/A | CPU0−CPU7 | N/A |
Cortex R5 |
ATID | Processor | Protocol |
Configurable | BPMP | ETMv3 |
Configurable | SPE | ETMv3 |
Configurable | SCE | ETMv3 |
Configurable | RCE | ETMv3 |
Configurable | PVA 0 and PVA1 | ETMv3 |
APE, Cortex A9 |
ATID | Processor | Protocol |
0x20 | CPU0 | PFT1.0 |
STM |
ATID | Processor | Protocol |
0x10 | NA | MIPI STP |
Uncore: Performance Monitor Unit
Applies to: T186 processors (Jetson TX2 series) and T194 processors (Jetson AGX Xavier series and Jetson Xavier NX series)
Several functional units on the T194 CCPLEX (e.g., the SCF and the L2) are outside the cores. These units are collectively referred to as the uncore. Some of them report uncore performance events and event counters, which are not counted by the core performance counters of the core’s Performance Monitor Unit (PMU).
The NVIDIA Uncore Perfmon Extension to the ARM® Performance Monitor Extension (also called “uncore perfmon”) allows ARM software to access its performance counters. The uncore perfmon extension is designed to resemble the standard ARM Performance Monitor Extension as much as possible.
ARM PMU documentation may be downloaded from the Linux Kernel Archives.
Device-Specific Features and Limitations
This section describes features and limitations of uncore perfmon on specific NVIDIA Jetson and NVIDIA® Tegra® systems on chip (SoCs).
• T186 (used in NVIDIA Jetson TX2 series modules):
• Uncore permon events are not supported for Denver cores and the Denver cluster.
• T194 (used in NVIDIA Jetson Xavier™ series modules):
• Uncore perfmon events are supported for all cores and clusters.
• For more information about using the counters, see the kernel documentation within source code at:
<kernel-source-path>/Documentation/devicetree/bindings/platform/tegra/nvidia,carmel-pmu.txt