1. Walkthrough: Launching and Debugging a CUDA Application

In the following walkthrough, we present some of the more common procedures that you might use to debug a CUDA-based application. We use a sample application called Matrix Multiply as an example. The CUDA Toolkit CUDA Samples and the NVIDIA/cuda-samples repository on GitHub includes this sample application.

1.1. Open the Sample Project and Set Breakpoints

  1. From Visual Studio Code, open the directory from the CUDA Samples called matrixMul.

    For assistance in locating sample applications, see Working with Samples.

      Note:  

    This file contains code for the CPU (i.e. matrixMultiply()) and GPU (i.e. matrixMultiplyCUDA(), any function specified with a __global__ or __device__ keyword).

  2. First, let's set some breakpoints in GPU code.

    1. Open the file called matrixMul.cu, and find the CUDA kernel function matrixMulCUDA().

    2. Set a breakpoint at:

      int aStep  =  BLOCK_SIZE
    3. Set another breakpoint at the statement that begins with:

      for {int a = aBegin, b = bBegin;
  3. Now, let's set some breakpoints in CPU code:

    1. In the same file, matrixMul.cu, find the CPU function matrixMultiply().

    2. Set one breakpoint at:

      if (block_size == 16)
    3. Set another breakpoint at the statement that begins with: 

      printf("done\n"); 

1.2. Create a Launch Configuration

In order to debug our application we must first create a launch configuration. To create a launch.json first go to the Run and Debug tab and click create a launch.json file.

Select CUDA C++ (CUDA-GDB) for the environment.

Here is the launch configuration generated for CUDA debugging:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": ""
        }
    ]
}

In the launch.json change the program property to ${workspaceFolder}/matrixMul.

  Note:  

${workspaceFolder} is a predefined variable that represents the path to the folder that is opened in VS Code.

Other attributes available for the launch configuration include:

  • debuggerPath: The path to cuda-gdb. If unspecified, the path will be searched for cuda-gdb.
  • args: Command-line arguments to pass to the debuggee.
  • initCommands: List of GDB commands sent before starting inferior.
  • breakOnLaunch: Break on the first instruction of every launched kernel.
  • onAPIError: Indicates the action to perform if a driver API or runtime API error occurs. Valid values are hide, ignore, and stop.

1.3. Build the Sample and Launch the Debugger

In order to build our application, we must first create integrate our build system with a task. Go to the Command Palette and execute the Tasks: Configure Default Build Task command.

Here is the task configuration that is generated:

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "echo",
            "type": "shell",
            "command": "echo Hello",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

Make the following changes to configure the task to build the matrixMul project for debugging:

  • Change the command property to make dbg=1. The dbg variable is required in order to generated unoptimized code with symbolics information.

  • Add "$nvcc" to the problemMatcher array. This will detect nvcc build errors and propagate them to the Visual Studio Code Problems panel.

To build the tasks go to the Command Palette again and run the Tasks: Run Build Task task. View the Problems and Terminal panels for error messages.

To start debugging either go to the Run and Debug tab and click the Start Debugging button or simply press F5.

You've started the debugging session. In the Control GPU Execution and Inspect State topics we'll look at some of the tools you typically use during a debugging session.

2. Walkthrough: Debugging a Running CUDA Application Using Attach

In this walkthrough, we will attach to, and debug a running CUDA-based application. As with the last walkthrough, we will use Matrix Multiply as our application. The CUDA Toolkit CUDA Samples and the NVIDIA/cuda-samples repository on GitHub includes this sample application.

2.1. Open the Sample Project, Make a Small Edit, and Set Breakpoints

  1. From Visual Studio Code, open the directory from the CUDA Samples called matrixMul.

    For assistance in locating sample applications, see Working with Samples.

      Note:  

    This file contains code for the CPU (i.e. matrixMultiply()) and GPU (i.e. matrixMultiplyCUDA(), any function specified with a __global__ or __device__ keyword).

  2. Add sleep(100); after the first printf of the main() entry point. This will effectively pause the program, so that we can attach to the running process.

  3. Then we set some breakpoints, just like in the launch walkthrough. First, in the GPU code.

    1. Open the file called matrixMul.cu, and find the CUDA kernel function matrixMulCUDA().

    2. Set a breakpoint at:

      int aStep  =  BLOCK_SIZE
    3. Set another breakpoint at the statement that begins with:

      for {int a = aBegin, b = bBegin;
  4. Followed by setting some breakpoints in CPU code:

    1. In the same file, matrixMul.cu, find the CPU function matrixMultiply().

    2. Set one breakpoint at:

      if (block_size == 16)
    3. Set another breakpoint at the statement that begins with: 

      printf("done\n"); 

2.2. Create a Launch Configuration to Attach to a Running Process

In order to debug our application we must first create a launch configuration. To create a launch.json first go to the Run and Debug tab and click create a launch.json file.

Select CUDA C++ (CUDA-GDB) for the environment.

Here is the launch configuration generated for CUDA debugging:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Attach",
            "type": "cuda-gdb",
            "request": "attach",
            "processId": "${command:cuda.pickProcess}"
        }
    ]
}

  Note:  

${command:cuda.pickProcess} is a predefined variable that represents the function that opens the processPicker to select the process to choose from in VS Code.

Other attributes available for the launch configuration include:

  • debuggerPath: The path to cuda-gdb. If unspecified, the path will be searched for cuda-gdb.
  • args: Command-line arguments to pass to the debuggee.
  • initCommands: List of GDB commands sent before starting inferior.
  • breakOnLaunch: Break on the first instruction of every launched kernel.
  • onAPIError: Indicates the action to perform if a driver API or runtime API error occurs. Valid values are hide, ignore, and stop.

2.3. Build the Sample

In order to build our application, we must first create integrate our build system with a task. Go to the Command Palette and execute the Tasks: Configure Default Build Task command.

Here is the task configuration that is generated:

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "echo",
            "type": "shell",
            "command": "echo Hello",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

Make the following changes to configure the task to build the matrixMul project for debugging:

  • Change the command property to make dbg=1. The dbg variable is required in order to generated unoptimized code with symbolics information.

  • Add "$nvcc" to the problemMatcher array. This will detect nvcc build errors and propagate them to the Visual Studio Code Problems panel.

To build the tasks go to the Command Palette again and run the Tasks: Run Build Task task. View the Problems and Terminal panels for error messages.

2.4. Launch the Application

Start matrixMul in the background by running ./matrixMul & on the terminal in the matrixMul folder.

2.5. Launch the Debugger and Attach to the Running Application

Before the sleep(100) expires, launch the debugger to attach to the program.

To start debugging either go to the Run and Debug tab and click the Start Debugging button or simply press F5.

A process picker will appear. Choose matrixMul to begin your debugging session.

Once the sleep(100) expires, your code execution will stop at the first instruction executed after the sleep(100) at which you had a breakpoint. You can step, press F5 to continue, or press SHIFT-F5 to detach and allow the application to run freely.

Once, the application terminates, remove the first breakpoint you hit and repeat process to find that you can hit other breakpoints.

In the Control GPU Execution and Inspect State topics we'll look at some of the tools you typically use during a debugging session.

3. Walkthrough: Launching and Debugging a remote application using cuda-gdbserver

In the following walkthrough, we present some of the more common procedures that you might use to debug a CUDA-based application on a remote target machine. We use a sample application called Matrix Multiply as an example. The CUDA Toolkit CUDA Samples and the NVIDIA/cuda-samples repository on GitHub includes this sample application.

3.1. Open the Sample Project and Set Breakpoints

On the local machine,

  1. From Visual Studio Code, open the directory from the CUDA Samples called matrixMul.

    For assistance in locating sample applications, see Working with Samples.

      Note:  

    This file contains code for the CPU (i.e. matrixMultiply()) and GPU (i.e. matrixMultiplyCUDA(), any function specified with a __global__ or __device__ keyword).

  2. First, let's set some breakpoints in GPU code.

    1. Open the file called matrixMul.cu, and find the CUDA kernel function matrixMulCUDA().

    2. Set a breakpoint at:

      int aStep  =  BLOCK_SIZE
    3. Set another breakpoint at the statement that begins with:

      for {int a = aBegin, b = bBegin;
  3. Now, let's set some breakpoints in CPU code:

    1. In the same file, matrixMul.cu, find the CPU function matrixMultiply().

    2. Set one breakpoint at:

      if (block_size == 16)
    3. Set another breakpoint at the statement that begins with: 

      printf("done\n"); 

3.2. Create a Launch Configuration

In order to debug our application we must first create a launch configuration. To create a launch.json first go to the Run and Debug tab and click create a launch.json file.

Select CUDA C++ (CUDA-GDBSERVER) for the environment.

Here is the launch configuration generated for CUDA debugging:

{
	"version": "0.2.0",
	"configurations": [
		{
			"name": "CUDA C++: Launch",
			"type": "cuda-gdbserver",
			"request": "launch",
			"server": "cuda-gdbserver",
			"program": "",
			"target": {
				"host": "${config:host}",
				"port": "${config:port}"
			},
			"additionalSOLibSearchPath": "",
		}
	]
}

In the launch.json,

  • change the program property to ${workspaceFolder}/matrixMul,
  • set the target properties (host and port) to the host and port of the cuda-gdbserver you would be running, and
  • set the additionalSOLibSearchPath to be the directory where the debugger searches for shared libraries.

  Note:  

${workspaceFolder} is a predefined variable that represents the path to the folder that is opened in VS Code.

Other attributes available for the launch configuration include:

  • additionalSOLibSearchPath: The directory where the debugger searches for shared libraries.
  • args: Command-line arguments to pass to the debuggee.
  • breakOnLaunch: Break on the first instruction of every launched kernel.
  • cwd: The current working directory (cwd) for the debuggee process.
  • debuggerPath: The path to cuda-gdb. If unspecified, the path will be searched for cuda-gdb.
  • environment: Array containing objects that specify environment variables.
  • envFile: Absolute path to a file containing VAR=VALUE lines to specify environment variables.
  • initCommands: List of GDB commands sent before starting inferior.
  • logFile: Absolute path to the file to log interaction with cuda-gdb. Can be set to ${workspaceFolder}/myLogFile.txt, for example, to enable logging in cuda-gdb for helping customer support root cause any encountered issues. Log files can be uploaded using the Nsight VSCE Developer Forum.
  • onAPIError: Indicates the action to perform if a driver API or runtime API error occurs. Valid values are hide, ignore, and stop.
  • program: Path to the program to debug.
  • stopAtEntry: Break on the first instruction of the debuggee.
  • sysroot: Local directory with copies of target libraries.
  • Target:
    • Host: Target host to connect to.
    • Port: Target port to connect to.
    • Connect commands: Commands to run.
  • verboseLogging: Set to true to produce verbose log output.

3.3. Build the Sample

In order to build our application, we must first create integrate our build system with a task. Go to the Command Palette and execute the Tasks: Configure Default Build Task command.

Here is the task configuration that is generated:

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "echo",
            "type": "shell",
            "command": "echo Hello",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

3.4. Copy the sample to the remote machine and instantiate cuda-gdbserver on the remote machine

After the sample has been built, we use a predefined autostart task to copy the sample over to the remote machine and instantiate cuda-gdbserver on the remote machine. Go to the Command Palette and execute the Tasks: Configure Default Build Task command.

Several task options will be enlisted and for this case, you would want to pick Nsight: autostart (secure copy executable binary, remote).

Here is the task configuration that is generated:

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "Nsight: autostart (secure copy executable binary, remote)",
            "type": "shell",
            "command": "scp ${config:executable} ${config:username}@${config:host}:/tmp && ssh ${config:username}@${config:host} \"cuda-gdbserver ${config:host}:${config:port} /tmp/${config:execName}\"",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

At this point, you can either replace the values of the executable, username, host, port in the tasks.json or provide values for them by creating a settings.json. It is recommended that you create a settings.json because then you would be able to access those values in launch.json too.

If you create a settings.json, it would look like:
{
    "version": "2.0.0",
    "tasks": [
        {
            "username": "uname",
            "host": "127.0.0.1",
            "port": "12345",
            "executable": "${workspaceFolder}/matrixMul",
            "execName": "matrixMul"
        }
    ]
}

To run the tasks go to the Command Palette again and run the Tasks: Run Build Task task. View the Problems and Terminal panels for error messages.

To start debugging either
  • go to the Run and Debug tab and click the Start Debugging button, or
  • simply press F5.

4. Walkthrough: Launching and Debugging a remote application running on a QNX host using cuda-gdbserver

In the following walkthrough, we present some of the more common procedures that you might use to debug a CUDA-based application on a remote target machine, running QNX. We use a sample application called Matrix Multiply as an example. The CUDA Toolkit CUDA Samples and the NVIDIA/cuda-samples repository on GitHub includes this sample application.

For information on what version of samples are supported on DriveOS QNX please see NVIDIA DRIVE Documentation.

4.1. Open the Sample Project and Set Breakpoints

On the local machine,

  1. From Visual Studio Code, open the directory from the CUDA Samples called matrixMul.

    For assistance in locating sample applications, see Working with Samples.

      Note:  

    This file contains code for the CPU (i.e. matrixMultiply()) and GPU (i.e. matrixMultiplyCUDA(), any function specified with a __global__ or __device__ keyword).

  2. First, let's set some breakpoints in GPU code.

    1. Open the file called matrixMul.cu, and find the CUDA kernel function matrixMulCUDA().

    2. Set a breakpoint at:

      int aStep  =  BLOCK_SIZE
    3. Set another breakpoint at the statement that begins with:

      for {int a = aBegin, b = bBegin;
  3. Now, let's set some breakpoints in CPU code:

    1. In the same file, matrixMul.cu, find the CPU function matrixMultiply().

    2. Set one breakpoint at:

      if (block_size == 16)
    3. Set another breakpoint at the statement that begins with: 

      printf("done\n"); 

4.2. Create a Launch Configuration

In order to debug our application we must first create a launch configuration. To create a launch.json first go to the Run and Debug tab and click create a launch.json file.

Select CUDA C++ QNX (CUDA-GDBSERVER) for the environment.

Here is the launch configuration generated for CUDA debugging:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA GDB Server: Launch",
            "type": "cuda-qnx-gdbserver",
            "request": "launch",
            "server": "cuda-gdbserver",
            "program": "",
            "target": {
                "host": "${config:host}",
                "port": "${config:port}"
            },
            "environment": [
                {
                    "name": "QNX_TARGET",
                    "value": ""
                },
                {
                 "name": "QNX_HOST",
                 "value": ""
                }
            ],
            "additionalSOLibSearchPath": "",
            "debuggerPath": ""
        }
    ]
}

In the launch.json,

  • change the program property to ${workspaceFolder}/matrixMul,
  • set the target properties (host and port) to the host and port of the cuda-gdbserver you would be running,
  • set the additionalSOLibSearchPath to be the directory where the debugger searches for shared libraries.
  • set any environment variables you need to, and
  • set debuggerPath to be the path to cuda-qnx-gdb on the host system.

  Note:  

${workspaceFolder} is a predefined variable that represents the path to the folder that is opened in VS Code.

Other attributes available for the launch configuration include:

  • additionalSOLibSearchPath: The directory where the debugger searches for shared libraries.
  • args: Command-line arguments to pass to the debuggee.
  • breakOnLaunch: Break on the first instruction of every launched kernel.
  • cwd: The current working directory (cwd) for the debuggee process.
  • debuggerPath: The path to cuda-gdb. If unspecified, the path will be searched for cuda-gdb.
  • environment: Array containing objects that specify environment variables.
  • envFile: Absolute path to a file containing VAR=VALUE lines to specify environment variables.
  • executableUploadPath: Absolute path (on the QNX board) to which you want to upload the executable to.
  • initCommands: List of GDB commands sent before starting inferior.
  • logFile: Absolute path to the file to log interaction with cuda-gdb. Can be set to ${workspaceFolder}/myLogFile.txt, for example, to enable logging in cuda-gdb for helping customer support root cause any encountered issues. Log files can be uploaded using the Nsight VSCE Developer Forum.
  • onAPIError: Indicates the action to perform if a driver API or runtime API error occurs. Valid values are hide, ignore, and stop.
  • program: Path to the program to debug.
  • stopAtEntry: Break on the first instruction of the debuggee.
  • sysroot: Local directory with copies of target libraries.
  • Target:
    • Host: Target host to connect to.
    • Port: Target port to connect to.
    • Connect commands: Commands to run.
  • verboseLogging: Set to true to produce verbose log output.

4.3. Build the Sample

Cross compile the sample using instructions from https://developer.nvidia.com/docs/drive/drive-os/6.0.6/public/drive-os-qnx-installation/common/topics/installation/build-samples/build-run-sample-apps-qnx.html

4.4. Copy cuda-gdbserver to the remote machine and instantiate it on the remote machine

After the sample has been built, we use a predefined autostart task to copy the sample over to the remote machine and instantiate cuda-gdbserver on the remote machine. Go to the Command Palette and execute the Tasks: Configure Default Build Task command.

Several task options will be enlisted and for this case, you would want to pick Nsight: autostart (secure copy cuda-gdbserver binary, remote QNX).

Here is the task configuration that is generated:

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "Nsight: autostart (secure copy cuda-gdbserver binary, remote QNX)",
            "type": "shell",
            "command": "scp ${config:cudaGdbServerPath} ${config:username}@${config:host}:/tmp && ssh ${config:username}@${config:host} /tmp/cuda-gdbserver ${config:port}",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

At this point, you can either replace the values of the executable, username, host, port in the tasks.json or provide values for them by creating a settings.json. It is recommended that you create a settings.json because then you would be able to access those values in launch.json too.

If you create a settings.json, it would look like:
{
    "version": "2.0.0",
    "tasks": [
        {
            "username": "uname",
            "host": "127.0.0.1",
            "port": "12345",
            "cudaGdbServerPath": "/usr/local/cuda-targets/aarch64-qnx/11.4/bin/cuda-gdbserver",
        }
    ]
}

To run the tasks go to the Command Palette again and run the Tasks: Run Build Task task. View the Problems and Terminal panels for error messages.

To start debugging either
  • go to the Run and Debug tab and click the Start Debugging button, or
  • simply press F5.

Notices

Notice

NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Code Edition 2023.1.0 User GuideSend Feedback

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA-GDB, CUDA-MEMCHECK, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, NVIDIA DRIVE, NVIDIA DRIVE AGX, NVIDIA DRIVE Software, NVIDIA DRIVE OS, NVIDIA Developer Zone (aka "DevZone"), GRID, Jetson, NVIDIA Jetson Nano, NVIDIA Jetson AGX Xavier, NVIDIA Jetson TX2, NVIDIA Jetson TX2i, NVIDIA Jetson TX1, NVIDIA Jetson TK1, Kepler, NGX, NVIDIA GPU Cloud, Maxwell, Multimedia API, NCCL, NVIDIA Nsight Compute, NVIDIA Nsight Eclipse Edition, NVIDIA Nsight Graphics, NVIDIA Nsight Integration, NVIDIA Nsight Systems, NVIDIA Nsight Visual Studio Edition, NVIDIA Nsight Visual Studio Code Edition, NVLink, nvprof, Pascal, NVIDIA SDK Manager, Tegra, TensorRT, Tesla, Visual Profiler, VisionWorks and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.