Kernel Debugging Tools#
NVIDIA® Jetson™ Linux lets you generate a kernel crash dump, which is a portion of the system’s volatile memory (RAM) saved to disk when the execution of the kernel is disrupted. The following events can cause such a disruption:
Kernel panic
Non-maskable interrupts (NMI)
Machine check exceptions (MCE)
Hardware failure
Manual intervention
You can find more details about kernel crash dumps at https://ubuntu.com/server/docs/kernel-crash-dump.
How to Setup#
This section describes how to enable kdump on Jetson Linux.
Install
linux-crashdumpby running the following commands:$ sudo apt-get update $ sudo apt-get install linux-crashdump // Select yes to enable kdump-tools-dump.service $ sudo dpkg-reconfigure kexec-tools //Pop will appear, Select Yes for both. $ sudo dpkg-reconfigure kdump-tools // Pop will appear, Select Yes.
Update the
/etc/default/kexecfile with these values:# Defaults for kexec initscript # sourced by /etc/init.d/kexec and /etc/init.d/kexec-load # Load a kexec kernel (true/false) LOAD_KEXEC=true # Kernel and initrd image KERNEL_IMAGE="/vmlinuz" INITRD="/initrd.img" # If empty, use current /proc/cmdline APPEND="" # Load the default kernel from grub config (true/false) USE_GRUB_CONFIG=true ubuntu@jetson:~$
Copy the
fdtDTB from/sys/firmware/to/boot/:$ cp /sys/firmware/fdt /boot/kexec.dtb
Convert
kexec.dtbto DTS:$ dtc -I dtb -O dts -o kexec.dts kexec.dtb
Make changes in
kexec.dtsto meet your requirements. The following PCIe changes are required:bus@0 { /* C0 */ pcie@14180000 { }; /* C1 */ pcie@14100000 { max-link-speed = <0x1>; }; /* C2 */ pcie@14120000 { max-link-speed = <0x1>; }; /* C3 */ pcie@14140000 { max-link-speed = <0x1>; }; /* C4 */ pcie@14160000 { max-link-speed = <0x1>; }; /* C5 */ pcie@141a0000 { max-link-speed = <0x1>; }; /* C6 */ pcie@141c0000 { max-link-speed = <0x1>; }; /* C7 */ pcie@141e0000 { max-link-speed = <0x1>; }; /* C8 */ pcie@140a0000 { max-link-speed = <0x1>; }; /* C9 */ pcie@140c0000 { max-link-speed = <0x1>; }; /* C10 */ pcie@140e0000 { max-link-speed = <0x1>; }; }Convert
kexec.dtsback to DTB:$ dtc -I dts -O dtb -o kexec.dtb kexec.dts
Update the
/etc/default/kdump-toolsfile with these requiredKDUMPvalues:USE_KDUMP=1 KDUMP_KERNEL=/boot/Image KDUMP_INITRD=/boot/initrd KDUMP_KEXEC_ARGS=" -c -i --dtb=/boot/kexec.dtb " KDUMP_CMDLINE=""
Modify the kernel command line by adding
crashkernel=2Ginto the/boot/extlinux/extlinux.conffile:TIMEOUT 30 DEFAULT primary MENU TITLE L4T boot options LABEL primary MENU LABEL primary kernel LINUX /boot/Image INITRD /boot/initrd APPEND ${cbootargs} root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2GNow reboot the system and make sure the kernel has reserved memory for kernel crash events by checking the kernel boot log, similar to the following example:
ubuntu@localhost:~$ sudo dmesg | grep "crash" [ 0.000000] crashkernel low memory reserved: 0xf7e00000 - 0xffe00000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000766200000 - 0x00000007e6200000 (2048 MB) [ 0.000000] Kernel command line: root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G bl_prof_dat [ 12.121580] pstore: Using crash dump compression: deflate ubuntu@localhost:~$
Make sure
kdump-configis updated and the service is running, similar to the following example:ubuntu@localhost:~$ sudo kdump-config show DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_COREDIR: /var/crash crashkernel addr: 0xf7e00000 0x766200000 /boot/Image kdump initrd: /boot/initrd current state: ready to kdump kexec command: /sbin/kexec -p -c -i --command-line="root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 bl_prof_dataptr=2031616@0x82C610000 bl_prof_ro_ptr=65536@0x82C600000 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/boot/initrd /boot/Image ubuntu@localhost:~$
Testing/Validation#
Once kdump-config shows that the service is running, you can perform validation.
Generate a kernel crash dump:
$ sudo su #echo c > /proc/sysrq-trigger
The system will reboot and store the kernel crash in the
/var/crashdirectory. Two reboots will occur: first to store the crash and second time it will do cold boot:ubuntu@localhost:~$ sudo su root@localhost:/home/ubuntu# cd /var/crash/ root@localhost:/var/crash# ls -l total 68 drwxr-xr-x 2 root root 4096 Aug 19 12:22 202408191220 drwxr-xr-x 2 root root 4096 Aug 19 12:24 202408191223 -rw-r--r-- 1 root root 0 Aug 19 12:23 kdump_lock -rw-r--r-- 1 root root 452 Aug 19 12:25 kexec_cmd -rw-r----- 1 root root 25712 Aug 19 12:22 linux-image-5.15.136-tegra-202408191220.crash -rw-r----- 1 root root 25748 Aug 19 12:25 linux-image-5.15.136-tegra-202408191223.crash root@localhost:/var/crash#
log 202408191220contains the directory where the crash dump is stored. This will generate a directory with the date and time of the event, and store the crash dump.