Debugging on Jetson Platforms

NVIDIA® Jetson™ devices support debugging tools that allow Jetson application developers to place the processor into known states and trace its behavior while running. The Jetson architecture’s debugging support provides:
Reduced power leakage
Enhanced security
Availability of standard interfaces
This topic describes the debuggable blocks and their debugging strategies. Use this information to help determine why something may not be working in the software you have developed using NVIDIA Jetson Board Support Package (BSP).
This topic also describes the software implementation of the hardware and software features in the Jetson processor family of Technical Reference Manuals (TRMs). Use the appropriate TRM as your primary source for information and debugging:
For Jetson Nano devices and Jetson TX1: Tegra X1 (SoC) Technical Reference Manual
For Jetson Xavier™ NX series and Jetson AGX Xavier™ series processors: Xavier Series (SoC) Technical Reference Manual
For Jetson TX2 series processors: Tegra X2 (Parker Series SoC) Technical Reference Manual

Debugging Improvements

The following table describes improvements in debugging features for Jetson devices.
Hardware Feature
Benefit
Jetson Nano devices & Jetson TX1
Jetson TX2 series
Jetson Xavier NX & Jetson AGX Xavier series
Hardware interface to debugger.
 
 
 
JTAG (4-pin connector)
X
X
X
SWD (2-pin connector)
X
Debug interface connected to CPU via Debug Communication Channel with Memory Access Mode in v8.
Debugger downloads & uploads code faster.
X
X
X
Debug connection to AXI-AP via JTAG or SWD).
System access when CPUs are unavailable (powered down, dead, in reset, etc.).
X
X
X
AXI-AP 34-bit address can access MMIO & DRAM with requiring SMMU.
X
X
X
Connection to SNIC allows access to entire system.
 
X
X
Debugger accesses to memory are coherent.
 
 
X
CoreSight support via JTAG or SWD.
 
 
 
Connection to APE.
X
X
X
Connection to BPMP, SPE, & SCE.
X
X
Connection to RCE, PVA0, & PVA1.
X
Trace Storage circular buffer.
Larger buffer yields a longer duration trace. Buffer is preserved through WDT resets.
16 KiB
32 KiB
32 KiB

CoreSight Trace Sinks ETF and ETR

The following table describes Arm® CoreSight™ trace sink characteristics for Jetson. These characteristics include the corresponding Embedded Trace FIFO (ETF), ETR, and USB limits.
Characteristic
ETF (32 KiB)
DDR via ETR DMA
TPIU (Jetson Xavier NX & Jetson AGX Xavier only)
USB
Throughput
41.58 Gbps @ 408 MHz, 128‑bit *
41.58 Gbps
800 Mbps
Real time processor tracing requires reduction of CPU frequency.
Intrusive
No
Yes
No
Yes
Available on commercial devices
Yes
Yes
Yes
Yes
Use Cases
Collect trace for watchdog reset; code optimization for the CCPLEX.
Collect trace for watchdog reset; code optimization for the CCPLEX. †
Collects trace for watchdog reset; code optimization for the CCPLEX. Tracing is limited to bandwidth speed.
Single-CPU trace at low frequency, or APE-only trace to avoid DRAM bandwidth saturation. Tracing is limited to USB speeds.
* Contact NVIDIA for higher frequency requirements.
† Note the high bandwidth requirement at DDR = 25%.

CoreSight AMBA Trace ID (ATID) Mapping

The following tables describe mapping for CoreSight AMBA® Trace ID (ATID). When collecting trace from multiple sources, the trace sinks (ETF and ETR) use ATIDs to segregate trace data.

Jetson Nano Devices, Jetson TX2 Series, and Jetson TX1

 
BCCPLEX (also called Fast Cluster or Big Cluster)
using A57 processors
ATID
Processor
Protocol
0x40
CPU0
ETMv4
0x41
CPU1
ETMv4
0x42
CPU2
ETMv4
0x43
CPU3
ETMv4
APE, Cortex® A9
ATID
Processor
Protocol
0x20
CPU0
PFT1.0
STM
ATID
Processor
Protocol
0x10
NA
MIPI STP

Jetson Xavier NX Series and Jetson AGX Xavier Series

 
CCPLEX using NVIDIA processors
ATID
Processor
Protocol
N/A
CPU0−CPU7
N/A
Cortex R5
ATID
Processor
Protocol
Configurable
BPMP
ETMv3
Configurable
SPE
ETMv3
Configurable
SCE
ETMv3
Configurable
RCE
ETMv3
Configurable
PVA 0 and PVA1
ETMv3
APE, Cortex A9
ATID
Processor
Protocol
0x20
CPU0
PFT1.0
STM
ATID
Processor
Protocol
0x10
NA
MIPI STP

Uncore: Performance Monitor Unit

Applies to: T186 processors (Jetson TX2 series) and T194 processors (Jetson AGX Xavier series and Jetson Xavier NX series)
Several functional units on the T194 CCPLEX (e.g., the SCF and the L2) are outside the cores. These units are collectively referred to as the uncore. Some of them report uncore performance events and event counters, which are not counted by the core performance counters of the core’s Performance Monitor Unit (PMU).
The NVIDIA Uncore Perfmon Extension to the ARM® Performance Monitor Extension (also called “uncore perfmon”) allows ARM software to access its performance counters. The uncore perfmon extension is designed to resemble the standard ARM Performance Monitor Extension as much as possible.
ARM PMU documentation may be downloaded from the Linux Kernel Archives.

Device-Specific Features and Limitations

This section describes features and limitations of uncore perfmon on specific NVIDIA Jetson and NVIDIA® Tegra® systems on chip (SoCs).
T186 (used in NVIDIA Jetson TX2 series modules):
Uncore permon events are not supported for Denver cores and the Denver cluster.
T194 (used in NVIDIA Jetson Xavier™ series modules):
Uncore perfmon events are supported for all cores and clusters.
For more information about using the counters, see the kernel documentation within source code at:
<kernel-source-path>/Documentation/devicetree/bindings/platform/tegra/nvidia,carmel-pmu.txt