Debugging on Jetson Platforms#
Jetson devices support debugging tools that allow Jetson application developers to put the processor into known states and trace its behavior while running. Use these tools to debug software you have developed using NVIDIA Jetson Board Support Package (BSP).
The Jetson architecture’s debugging support provides:
Reduced power leakage
Enhanced security
Availability of standard interfaces
This topic describes the debuggable blocks, their debugging strategies, and the software implementation of the hardware and software features in the Technical Reference Manuals (TRMs) for Jetson processors. Use the appropriate TRM as your primary source for information and debugging.
Debugging capabilities are based on the Arm CoreSight Architecture Specification v3.0.
CoreSight Trace Support#
AMBA Trace Bus (ATB) constitutes the primary trace backbone in the system. Trace sources that can dump data over ATB include the following:
Arm Cortex-R5 (such as BPMP, RCE, DCE0/DCE1, PVA0-R5, FSI-R5, and ISC) ETMs
Cadence Xtensa (such as APE0 HiFi 5, APE1 HiFi 5, and AOC-F1) TRAX
CoreSight System Trace Macrocell (STM); implemented for software instrumentation purposes
CCPLEX (CPU-UCF) ETMs/ETEs and ELA600s
Available trace sinks:
ETF
512 KiB capacity on Jetson Thor and 128 KiB on Jetson AGX Orin.
Can be configured as a buffer for on-chip trace storage.
Typically used when
Connection to off-chip TPA is unavailable or has insufficient bandwidth.
Traffic to DRAM needs to remain uninterrupted.
Trace can fit into buffer.
ETF can be used to absorb bandwidth spikes in trace while streaming trace to DRAM or off-chip TPA.
DRAM
Capacity is determined by platform.
ETR is used to convert ATB traffic to AXI traffic and streamed via DBB to a GSC carveout. ETR has a dedicated GSC assigned to dump its trace in memory.
Typically used when
Connection to off-chip TPA is unavailable or has insufficient bandwidth.
Traffic to DRAM can accommodate trace bandwidth.
Trace does not fit in ETF.
TPA
Acts as an off-chip device that stores the trace sent over an interface.
Arm’s DSTREAM is an example of a TPA, which has a 4 GiB storage capacity.
Typically used when
Traffic to DRAM needs to remain uninterrupted.
ETF capacity is insufficient to store trace.
Off-chip trace streaming bandwidth is acceptable.
HSSTP is used to send traffic to TPA. For details, see High Speed Serial Trace Port (HSSTP).
Coresight ETE and TRBE#
Coresight Embedded Trace Extension (ETE) is the trace architecture for Armv9-A PEs. ETE has many similarities with the Arm ETMv4 architecture. On Jetson Thor, the trace from the Coresight ETE module in each CCPLEX core is routed to system memory via Trace Buffer Extension (TRBE).
The Linux kernel tool perf
built with the OpenCSD library can be used to decode Coresight trace packets.
The following are examples of perf
commands to record and decode Coresight trace packets:
perf record -e cs_etm/contextid=0,timestamp=0/u ls
perf report --stdio --dump -i ./perf.data 2>&1 | tee dump.log
System Trace Macrocell (STM)#
System Trace Macrocell (STM) is a trace source integrated into a CoreSight system, designed primarily for high-bandwidth tracing of instrumentation embedded into software. This instrumentation is composed of memory-mapped writes to the STM Advanced eXtensible Interface (AXI) slave, which carries information about the behavior of the software.
STM uses the AXI master ID and STM channel ID to distinguish the traces. STM also supports timestamping that can be used to get a coherent profile across the system.
STM can be used as an alternative to printk instructions to debug kernel code. STM provides high-bandwidth trace data without the overhead of printk. Unlike most runtime debugging methods, STM traces are persistent in the ETF buffer even if a hardware reset occurs.
High Speed Serial Trace Port (HSSTP)#
High Speed Serial Trace Port (HSSTP) Probe refers to an Arm probe that converts the trace data output from HSSTP to a parallel format consumed by DSTREAM. HSSTP provides the means to extract trace from various on-chip sources, such as execution/instruction traces (from CCPLEX ETMs and Cortex-R5 ETMs) and instrumentation traces (from STM) in real time to an off-chip device at high speeds. The maximum trace rate is capped at 10 Gbps per lane with support for up to two lanes providing a combined bandwidth of 20 Gbps.
For more information, see the Arm HSSTP Architecture Specification.