Kernel Debugging Tools
NVIDIA® Jetson™ Linux lets you generate a kernel crash dump, which is a portion of the system’s volatile memory (RAM) saved to disk when the execution of the kernel is disrupted. The following events can cause such a disruption:
Kernel panic
Non-maskable interrupts (NMI)
Machine check exceptions (MCE)
Hardware failure
Manual intervention
You can find more details about kernel crash dumps at https://ubuntu.com/server/docs/kernel-crash-dump.
How to Setup
This section describes how to enable kdump
on Jetson Linux.
Install
linux-crashdump
by running the following commands:$sudo apt-get update $sudo apt-get install linux-crashdump // Select yes to enable kdump-tools-dump.service $sudo dpkg-reconfigure kexec-tools //Pop will appear, Select Yes for both. $sudo dpkg-reconfigure kdump-tools // Pop will appear, Select Yes.
Update the
/etc/default/kexec
file with these values:# Defaults for kexec initscript # sourced by /etc/init.d/kexec and /etc/init.d/kexec-load # Load a kexec kernel (true/false) LOAD_KEXEC=true # Kernel and initrd image KERNEL_IMAGE="/vmlinuz" INITRD="/initrd.img" # If empty, use current /proc/cmdline APPEND="" # Load the default kernel from grub config (true/false) USE_GRUB_CONFIG=true ubuntu@jetson:~$
Update the
/etc/default/kdump-tools
file with these requiredKDUMP
values:USE_KDUMP=1 KDUMP_KERNEL=/boot/Image KDUMP_INITRD=/boot/initrd KDUMP_KEXEC_ARGS=" -c -i " KDUMP_CMDLINE=""
Modify the kernel command line by adding
crashkernel=2G
into the/boot/extlinux/extlinux.conf
file:TIMEOUT 30 DEFAULT primary MENU TITLE L4T boot options LABEL primary MENU LABEL primary kernel LINUX /boot/Image INITRD /boot/initrd APPEND ${cbootargs} root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G
Now reboot the system and make sure the kernel has reserved memory for kernel crash events by checking the kernel boot log, similar to the following example:
ubuntu@localhost:~$ sudo dmesg | grep "crash" [ 0.000000] crashkernel low memory reserved: 0xf7e00000 - 0xffe00000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000766200000 - 0x00000007e6200000 (2048 MB) [ 0.000000] Kernel command line: root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G bl_prof_dat [ 12.121580] pstore: Using crash dump compression: deflate ubuntu@localhost:~$
Make sure
kdump-config
is updated and the service is running, similar to the following example:ubuntu@localhost:~$ sudo kdump-config show DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_COREDIR: /var/crash crashkernel addr: 0xf7e00000 0x766200000 /boot/Image kdump initrd: /boot/initrd current state: ready to kdump kexec command: /sbin/kexec -p -c -i --command-line="root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 bl_prof_dataptr=2031616@0x82C610000 bl_prof_ro_ptr=65536@0x82C600000 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/boot/initrd /boot/Image ubuntu@localhost:~$
Testing/Validation
Once kdump-config
shows that the service is running, you can perform validation.
Generate a kernel crash dump:
$sudo su #echo c > /proc/sysrq-trigger
The system will reboot and store the kernel crash in the
/var/crash
directory. Two reboots will occur: first to store the crash and second time it will do cold boot:ubuntu@localhost:~$ sudo su root@localhost:/home/ubuntu# cd /var/crash/ root@localhost:/var/crash# ls -l total 68 drwxr-xr-x 2 root root 4096 Aug 19 12:22 202408191220 drwxr-xr-x 2 root root 4096 Aug 19 12:24 202408191223 -rw-r--r-- 1 root root 0 Aug 19 12:23 kdump_lock -rw-r--r-- 1 root root 452 Aug 19 12:25 kexec_cmd -rw-r----- 1 root root 25712 Aug 19 12:22 linux-image-5.15.136-tegra-202408191220.crash -rw-r----- 1 root root 25748 Aug 19 12:25 linux-image-5.15.136-tegra-202408191223.crash root@localhost:/var/crash#
log 202408191220
contains the directory where the crash dump is stored. This will generate a directory with the date and time of the event, and store the crash dump.