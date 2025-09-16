This section outlines the procedures for debugging an application crash.

In the event of an application crash, you might encounter messages like Segmentation fault (core dumped) or Aborted (core dumped) . These indicate the generation of a core dump file, which captures the application’s memory state at the time of the crash. This file can be utilized for debugging purposes.

There are instances where core dumps might be disabled or not generated despite an application crash.

To activate core dumps, it’s necessary to configure the ulimit setting, which determines the maximum size of core dump files. By default, ulimit is set to 0, effectively disabling core dumps. Setting ulimit to unlimited enables the generation of core dumps.

Copy Copied! ulimit -c unlimited

Additionally, configuring the core_pattern value is required. This value specifies the naming convention for the core dump file. To view the current core_pattern setting, execute the following command:

Copy Copied! cat /proc/sys/kernel/core_pattern # or sysctl kernel.core_pattern

To modify the core_pattern value, execute the following command:

Copy Copied! echo "coredump_%e_%p" | sudo tee /proc/sys/kernel/core_pattern # or sudo sysctl -w kernel.core_pattern=coredump_%e_%p

In this case, we have requested that both the executable name ( %e ) and the process id ( %p ) be present in the generated file’s name. The various options available are documented in the core documentation.

If you encounter errors like tee: /proc/sys/kernel/core_pattern: Read-only file system or sysctl: setting key "kernel.core_pattern", ignoring: Read-only file system within a Docker container, it’s advisable to set the kernel.core_pattern parameter on the host system instead of within the container.

As kernel.core_pattern is a system-wide kernel parameter, modifying it on the host should impact all containers. This method, however, necessitates appropriate permissions on the host machine.

Furthermore, when launching a Docker container using docker run , it’s often essential to include the --cap-add=SYS_PTRACE option to enable core dump creation inside the container. Core dump generation typically requires elevated privileges, which are not automatically available to Docker containers.

After the core dump file is generated, you can utilize GDB to debug the core dump file.

Consider a scenario where a segmentation fault is intentionally induced at line 29 in examples/ping_simple/cpp/ping_simple.cpp by adding the line *(int*)0 = 0; to trigger the fault.

Copy Copied! --- a/examples/ping_simple/cpp/ping_simple.cpp +++ b/examples/ping_simple/cpp/ping_simple.cpp @@ -19,7 +19,6 @@ #include <holoscan/operators/ping_tx/ping_tx.hpp> #include <holoscan/operators/ping_rx/ping_rx.hpp> - class MyPingApp : public holoscan::Application { public: void compose() override { @@ -27,6 +26,7 @@ class MyPingApp : public holoscan::Application { // Define the tx and rx operators, allowing the tx operator to execute 10 times auto tx = make_operator<ops::PingTxOp>("tx", make_condition<CountCondition>(10)); auto rx = make_operator<ops::PingRxOp>("rx"); + *(int*)0 = 0;

Upon running ./examples/ping_simple/cpp/ping_simple , the following output is observed:

Copy Copied! $ ./examples/ping_simple/cpp/ping_simple Segmentation fault (core dumped)

It’s apparent that the application has aborted and a core dump file has been generated.

Copy Copied! $ ls coredump* coredump_ping_simple_2160275

The core dump file can be debugged using GDB by executing gdb <application> <coredump_file> .

Copy Copied! $ gdb ./examples/ping_simple/cpp/ping_simple coredump_ping_simple_2160275

gives

Copy Copied! GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./examples/ping_simple/cpp/ping_simple... [New LWP 2160275] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./examples/ping_simple/cpp/ping_simple'. Program terminated with signal SIGSEGV, Segmentation fault. #0 MyPingApp::compose (this=0x563bd3a3de80) at ../examples/ping_simple/cpp/ping_simple.cpp:29 29 *(int*)0 = 0; (gdb)

It is evident that the application crashed at line 29 of examples/ping_simple/cpp/ping_simple.cpp .

To display the backtrace, the bt command can be executed.

Copy Copied! (gdb) bt #0 MyPingApp::compose (this=0x563bd3a3de80) at ../examples/ping_simple/cpp/ping_simple.cpp:29 #1 0x00007f2a76cdb5ea in holoscan::Application::compose_graph (this=0x563bd3a3de80) at ../src/core/application.cpp:325 #2 0x00007f2a76c3d121 in holoscan::AppDriver::check_configuration (this=0x563bd3a42920) at ../src/core/app_driver.cpp:803 #3 0x00007f2a76c384ef in holoscan::AppDriver::run (this=0x563bd3a42920) at ../src/core/app_driver.cpp:168 #4 0x00007f2a76cda70c in holoscan::Application::run (this=0x563bd3a3de80) at ../src/core/application.cpp:207 #5 0x0000563bd2ec4002 in main (argc=1, argv=0x7ffea82c4c28) at ../examples/ping_simple/cpp/ping_simple.cpp:38

In cases where a distributed application using the UCX library encounters a segmentation fault, you might see stack traces from UCX. This is a default configuration of the UCX library to output stack traces upon a segmentation fault. However, this behavior can be modified by setting the UCX_HANDLE_ERRORS environment variable:

UCX_HANDLE_ERRORS=bt prints a backtrace during a segmentation fault (default setting).

UCX_HANDLE_ERRORS=debug attaches a debugger if a segmentation fault occurs.

UCX_HANDLE_ERRORS=freeze freezes the application on a segmentation fault.

UCX_HANDLE_ERRORS=freeze,bt both freezes the application and prints a backtrace upon a segmentation fault.

UCX_HANDLE_ERRORS=none disables backtrace printing during a segmentation fault.

While the default action is to print a backtrace on a segmentation fault, it may not always be helpful.

For instance, if a segmentation fault is intentionally caused at line 139 near the start of PingTensorTxOp::compute in /workspace/holoscan-sdk/src/operators/ping_tensor_tx/ping_tensor_tx.cpp (by adding *(int*)0 = 0; ), running ./examples/ping_distributed/cpp/ping_distributed will result in the following output:

Copy Copied! [holoscan:2097261:0:2097311] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace (tid:2097311) ==== 0 /opt/ucx/1.15.0/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7f18db865264] 1 /opt/ucx/1.15.0/lib/libucs.so.0(+0x3045f) [0x7f18db86545f] 2 /opt/ucx/1.15.0/lib/libucs.so.0(+0x30746) [0x7f18db865746] 3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f18da9ee520] 4 ./examples/ping_distributed/cpp/ping_distributed(+0x103d2b) [0x5651dafc7d2b] 5 /workspace/holoscan-sdk/build-debug-x86_64/lib/libholoscan_core.so.1(_ZN8holoscan3gxf10GXFWrapper4tickEv+0x13d) [0x7f18dcbfaafd] 6 /workspace/holoscan-sdk/build-debug-x86_64/lib/libgxf_core.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem11tickCodeletERKNS0_6HandleINS0_7CodeletEEE+0x127) [0x7f18db2cb487] 7 /workspace/holoscan-sdk/build-debug-x86_64/lib/libgxf_core.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem4tickElPNS0_6RouterE+0x444) [0x7f18db2cde44] 8 /workspace/holoscan-sdk/build-debug-x86_64/lib/libgxf_core.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem7executeElPNS0_6RouterERl+0x3e9) [0x7f18db2ce859] 9 /workspace/holoscan-sdk/build-debug-x86_64/lib/libgxf_core.so(_ZN6nvidia3gxf14EntityExecutor13executeEntityEll+0x41b) [0x7f18db2cf0cb] 10 /workspace/holoscan-sdk/build-debug-x86_64/lib/libgxf_serialization.so(_ZN6nvidia3gxf20MultiThreadScheduler20workerThreadEntranceEPNS0_10ThreadPoolEl+0x3c0) [0x7f18daf0cc50] 11 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f18dacb0253] 12 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f18daa40ac3] 13 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f18daad2660] ================================= Segmentation fault (core dumped)

Although a backtrace is provided, it may not always be helpful as it often lacks source code information. To obtain detailed source code information, using a debugger is necessary.

By setting the UCX_HANDLE_ERRORS environment variable to freeze,bt and running ./examples/ping_distributed/cpp/ping_distributed , we can observe that the thread responsible for the segmentation fault is frozen, allowing us to attach a debugger to it for further investigation.

Copy Copied! $ UCX_HANDLE_ERRORS=freeze,bt ./examples/ping_distributed/cpp/ping_distributed [holoscan:37 :1:51] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace (tid: 51) ==== 0 /opt/ucx/1.15.0/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7f9fc6d75264] 1 /opt/ucx/1.15.0/lib/libucs.so.0(+0x3045f) [0x7f9fc6d7545f] 2 /opt/ucx/1.15.0/lib/libucs.so.0(+0x30746) [0x7f9fc6d75746] 3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f9fc803e520] 4 /workspace/holoscan-sdk/build-x86_64/lib/libholoscan_op_ping_tensor_tx.so.2(_ZN8holoscan3ops14PingTensorTxOp7computeERNS_12InputContextERNS_13OutputContextERNS_16ExecutionContextE+0x53) [0x7f9fcad9e7f1] 5 /workspace/holoscan-sdk/build-x86_64/lib/libholoscan_core.so.2(_ZN8holoscan3gxf10GXFWrapper4tickEv+0x155) [0x7f9fc9e415eb] 6 /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem11tickCodeletERKNS0_6HandleINS0_7CodeletEEE+0x1a7) [0x7f9fc88f0347] 7 /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem4tickElPNS0_6RouterE+0x460) [0x7f9fc88f29c0] 8 /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so(_ZN6nvidia3gxf14EntityExecutor10EntityItem7executeElPNS0_6RouterERl+0x31e) [0x7f9fc88f31ee] 9 /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so(_ZN6nvidia3gxf14EntityExecutor13executeEntityEll+0x2e7) [0x7f9fc88f39d7] 10 /workspace/holoscan-sdk/build-x86_64/lib/libgxf_serialization.so(_ZN6nvidia3gxf20MultiThreadScheduler20workerThreadEntranceEPNS0_10ThreadPoolEl+0x419) [0x7f9fc8605dd9] 11 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f9fc8321253] 12 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f9fc8090ac3] 13 /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f9fc8121a04] ================================= [holoscan:2127091:0:2127105] Process frozen, press Enter to attach a debugger...

It is observed that the thread responsible for the segmentation fault is 51 ( tid: 51 ). To attach a debugger to this thread, simply press Enter.

Upon attaching the debugger, a backtrace will be displayed, but it may not be from the thread that triggered the segmentation fault. To handle this, use the info threads command to list all threads, and the thread <thread_id> command to switch to the thread that caused the segmentation fault.

Copy Copied! (gdb) info threads Id Target Id Frame * 1 Thread 0x7f9fc6ce2000 (LWP 37) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 2 Thread 0x7f9fc51bb000 (LWP 39) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 3 Thread 0x7f9fc11ba000 (LWP 40) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 4 Thread 0x7f9fbd1b9000 (LWP 41) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 5 Thread 0x7f9fabfff000 (LWP 42) "cuda00001400006" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 6 Thread 0x7f9f99fff000 (LWP 43) "async" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 7 Thread 0x7f9f95ffe000 (LWP 44) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 8 Thread 0x7f9f77fff000 (LWP 45) "dispatcher" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 9 Thread 0x7f9f73ffe000 (LWP 46) "async" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 10 Thread 0x7f9f6fffd000 (LWP 47) "worker" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 11 Thread 0x7f9f5bfff000 (LWP 48) "ping_distribute" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 12 Thread 0x7f9f57ffe000 (LWP 49) "dispatcher" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 13 Thread 0x7f9f53ffd000 (LWP 50) "async" 0x00007f9fc80e6612 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 14 Thread 0x7f9f4fffc000 (LWP 51) "worker" 0x00007f9fc80e642f in __GI___wait4 (pid=pid@entry=52, stat_loc=stat_loc@entry=0x7f9f4fff6cfc, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30

It’s evident that thread ID 14 is responsible for the segmentation fault ( LWP 51 ). To investigate further, we can switch to this thread using the command thread 14 in GDB:

Copy Copied! (gdb) thread 14

After switching, we can employ the bt command to examine the backtrace of this thread.

Copy Copied! (gdb) bt #0 0x00007f9fc80e642f in __GI___wait4 (pid=pid@entry=52, stat_loc=stat_loc@entry=0x7f9f4fff6cfc, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30 #1 0x00007f9fc80e63ab in __GI___waitpid (pid=pid@entry=52, stat_loc=stat_loc@entry=0x7f9f4fff6cfc, options=options@entry=0) at ./posix/waitpid.c:38 #2 0x00007f9fc6d72587 in ucs_debugger_attach () at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:816 #3 0x00007f9fc6d7531d in ucs_error_freeze (message=0x7f9fc6d93c53 "address not mapped to object") at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:919 #4 ucs_handle_error (message=0x7f9fc6d93c53 "address not mapped to object") at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:1089 #5 ucs_handle_error (message=0x7f9fc6d93c53 "address not mapped to object") at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:1077 #6 0x00007f9fc6d7545f in ucs_debug_handle_error_signal (signo=signo@entry=11, cause=0x7f9fc6d93c53 "address not mapped to object", fmt=fmt@entry=0x7f9fc6d93cf5 " at address %p") at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:1038 #7 0x00007f9fc6d75746 in ucs_error_signal_handler (signo=11, info=0x7f9f4fff73b0, context=<optimized out>) at /opt/ucx/src/contrib/../src/ucs/debug/debug.c:1060 #8 <signal handler called> #9 holoscan::ops::PingTensorTxOp::compute (this=0x5643fdcbd540, op_output=..., context=...) at /workspace/holoscan-sdk/src/operators/ping_tensor_tx/ping_tensor_tx.cpp:139 #10 0x00007f9fc9e415eb in holoscan::gxf::GXFWrapper::tick (this=0x5643fdcfef00) at /workspace/holoscan-sdk/src/core/gxf/gxf_wrapper.cpp:78 #11 0x00007f9fc88f0347 in nvidia::gxf::EntityExecutor::EntityItem::tickCodelet(nvidia::gxf::Handle<nvidia::gxf::Codelet> const&) () from /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so #12 0x00007f9fc88f29c0 in nvidia::gxf::EntityExecutor::EntityItem::tick(long, nvidia::gxf::Router*) () from /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so #13 0x00007f9fc88f31ee in nvidia::gxf::EntityExecutor::EntityItem::execute(long, nvidia::gxf::Router*, long&) () from /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so #14 0x00007f9fc88f39d7 in nvidia::gxf::EntityExecutor::executeEntity(long, long) () from /workspace/holoscan-sdk/build-x86_64/lib/libgxf_sample.so #15 0x00007f9fc8605dd9 in nvidia::gxf::MultiThreadScheduler::workerThreadEntrance(nvidia::gxf::ThreadPool*, long) () from /workspace/holoscan-sdk/build-x86_64/lib/libgxf_serialization.so #16 0x00007f9fc8321253 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #17 0x00007f9fc8090ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #18 0x00007f9fc8121a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Under the backtrace of thread 14, you will find:

Copy Copied! #8 <signal handler called> #9 holoscan::ops::PingTensorTxOp::compute (this=0x5643fdcbd540, op_output=..., context=...) at /workspace/holoscan-sdk/src/operators/ping_tensor_tx/ping_tensor_tx.cpp:139

This indicates that the segmentation fault occurred at line 139 in /workspace/holoscan-sdk/src/operators/ping_tensor_tx/ping_tensor_tx.cpp .

To view the backtrace of all threads, use the thread apply all bt command.