Kernel Debugging Tools

NVIDIA® Jetson™ Linux lets you generate a kernel crash dump, which is a portion of the system’s volatile memory (RAM) saved to disk when the execution of the kernel is disrupted. The following events can cause such a disruption:

  • Kernel panic

  • Non-maskable interrupts (NMI)

  • Machine check exceptions (MCE)

  • Hardware failure

  • Manual intervention

You can find more details about kernel crash dumps at https://ubuntu.com/server/docs/kernel-crash-dump.

How to Setup

This section describes how to enable kdump on Jetson Linux.

  1. Install  linux-crashdump by running the following commands:

    $sudo apt-get update
    $sudo apt-get install linux-crashdump // Select yes to enable kdump-tools-dump.service
    $sudo dpkg-reconfigure kexec-tools //Pop will appear, Select Yes for both.
    $sudo dpkg-reconfigure kdump-tools // Pop will appear, Select Yes.
    
  2. Update  the /etc/default/kexec file with these values:

    # Defaults for kexec initscript
    # sourced by /etc/init.d/kexec and /etc/init.d/kexec-load
    
    # Load a kexec kernel (true/false)
    LOAD_KEXEC=true
    
    # Kernel and initrd image
    KERNEL_IMAGE="/vmlinuz"
    INITRD="/initrd.img"
    
    # If empty, use current /proc/cmdline
    APPEND=""
    
    # Load the default kernel from grub config (true/false)
    USE_GRUB_CONFIG=true
    ubuntu@jetson:~$
    
  3. Update the /etc/default/kdump-tools file with these required KDUMP values:

    USE_KDUMP=1
    KDUMP_KERNEL=/boot/Image
    KDUMP_INITRD=/boot/initrd
    KDUMP_KEXEC_ARGS=" -c -i "
    KDUMP_CMDLINE=""
    
  4. Modify the kernel command line by adding crashkernel=2G into the /boot/extlinux/extlinux.conf file:

    TIMEOUT 30
    DEFAULT primary
    
    MENU TITLE L4T boot options
    
    LABEL primary
                    MENU LABEL primary kernel
                    LINUX /boot/Image
                    INITRD /boot/initrd
                    APPEND ${cbootargs} root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G
    
  5. Now reboot the system and make sure the kernel has reserved memory for kernel crash events by checking the kernel boot log, similar to the following example:

    ubuntu@localhost:~$ sudo dmesg | grep "crash"
    [    0.000000] crashkernel low memory reserved: 0xf7e00000 - 0xffe00000 (128 MB)
    [    0.000000] crashkernel reserved: 0x0000000766200000 - 0x00000007e6200000 (2048 MB)
    [    0.000000] Kernel command line: root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G bl_prof_dat
    [   12.121580] pstore: Using crash dump compression: deflate
    ubuntu@localhost:~$
    
  6. Make sure kdump-config is updated and the service is running, similar to the following example:

    ubuntu@localhost:~$ sudo kdump-config show
    DUMP_MODE:              kdump
    USE_KDUMP:              1
    KDUMP_COREDIR:          /var/crash
    crashkernel addr: 0xf7e00000
    0x766200000
            /boot/Image
    kdump initrd:
            /boot/initrd
    current state:    ready to kdump
    
    kexec command:
             /sbin/kexec -p  -c -i  --command-line="root=PARTUUID=f9bccae1-e09b-43bb-9770-87326321e634 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 bl_prof_dataptr=2031616@0x82C610000 bl_prof_ro_ptr=65536@0x82C600000  reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/boot/initrd /boot/Image
    ubuntu@localhost:~$
    

Testing/Validation

Once kdump-config shows that the service is running, you can perform validation.

  1. Generate a kernel crash dump:

    $sudo su
    #echo c > /proc/sysrq-trigger
    
  2. The system will reboot and store the kernel crash in the /var/crash directory. Two reboots will occur: first to store the crash and second time it will do cold boot:

    ubuntu@localhost:~$ sudo su
    root@localhost:/home/ubuntu# cd /var/crash/
    root@localhost:/var/crash# ls -l
    total 68
    drwxr-xr-x 2 root root  4096 Aug 19 12:22 202408191220
    drwxr-xr-x 2 root root  4096 Aug 19 12:24 202408191223
    -rw-r--r-- 1 root root     0 Aug 19 12:23 kdump_lock
    -rw-r--r-- 1 root root   452 Aug 19 12:25 kexec_cmd
    -rw-r----- 1 root root 25712 Aug 19 12:22 linux-image-5.15.136-tegra-202408191220.crash
    -rw-r----- 1 root root 25748 Aug 19 12:25 linux-image-5.15.136-tegra-202408191223.crash
    root@localhost:/var/crash#
    
  3. log 202408191220 contains the directory where the crash dump is stored.  This will generate a directory with the date and time of the event, and store the crash dump.