NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 5.3 User Guide
Send Feedback
NVIDIA Nsight includes support for tracing concurrent kernel execution on newer NVIDIA GPUs. In older versions of NVIDIA Nsight, analysis captures always serialized all kernel launches, forcing them to be executed one at a time. With the new concurrent kernel trace mode, the runtime behavior of the target application with respect to the concurrent kernel execution is maintained, and all kernel start and end times are captured without forcing the kernels to be executed one at a time.
Note that on NVIDIA GPUs built on the Tesla architecture, the serialized capture mode is always used, regardless of the configuration specified in the NVIDIA Nsight options.
![]() |
A few notes on concurrent kernel trace mode:
|
To select analysis trace mode, do the following:
Here's an example of how concurrent kernel execution appears in the timeline report for the concurrentKernels SDK sample that ships with the NVIDIA GPU Computing SDK. All eight kernel launches are executed in parallel on the GPU.
By contrast, in serialized mode, it's easy to see that all kernel launches are forced to be processed one at a time, causing a significantly different runtime behavior.
NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition User Guide Rev. 5.3.170616 ©2009-2017. NVIDIA Corporation. All Rights Reserved.