1. The NVIDIA Tools Extension Library (NVTX)

The NVIDIA Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications. Applications which integrate NVTX can use NVIDIA Nsight VSE to capture and visualize these events and ranges.

The NVTX API provides additional information to improve presentation of data. In general, the NVTX SDK adds additional value to NVIDIA’s tools while incurring almost no overhead when the tool is not attached to the application.

The NVTX API provides two core services:

  1. Tracing of CPU events and time ranges.

  2. Naming of OS and API resources.

NVTX can be quickly integrated into an application. The sample program below shows the use of marker events, range events, and resource naming.

void Wait(int waitMilliseconds) 
{
    nvtxNameOsThread(“MAIN”);
    nvtxRangePush(__FUNCTION__);
    nvtxMark(>"Waiting..."); 
    Sleep(waitMilliseconds);
    nvtxRangePop();
}

int main(void)
{
    nvtxNameOsThread("MAIN");
    nvtxRangePush(__FUNCTION__);
    Wait();
    nvtxRangePop();
}

The NVTX API can be used to:

  • Measure CPU code blocks.

  • Track lifetime of CPU resources (e.g., malloc).

  • Log critical events.

  • Extend the scope of NVIDIA Nsight VSE tools from NVIDIA-based drivers to the full application.

  • Improve the legibility of resources displayed in NVIDIA Nsight VSE.

2. NVTX Library

2.1. SDK Overview

The SDK consists of the include files, pre-built stub libraries, DLLs, and several SDK samples.

The SDK is installed at the following location:

C:\Program Files\NVIDIA Corporation\nvToolsExt
bin
     nvToolsExt64_1.dll
include
     nvToolsExt.h
     nvToolsExtCuda.h
     nvToolsExtOpenCL.h
lib 
     nvToolsExt64_1.lib

The SDK includes two sample projects located at one of the following locations: 

C:\Program Files (x86)\Nsight Visual Studio Edition 2020.1\Host\Samples\NsightSamples.zip

The SDK contains the following samples:

nvtxSimple

Demonstrates how to use the NVTX API to generate marker and range events, and name OS Threads and Categories.

nvtxMultithreaded 

Demonstrates more advanced usages of the NVTX C API. Introduces two sample C++ wrappers that simplify use of the API.

2.2. API Overview

Files

The core NVTX API is defined in file nvToolsExt.h, whereas domain-specific extensions to the NVTX interface are exposed in separate header files. For example, see nvToolsExtCuda.h for CUDA-specific NVTX API functions.

The library (.lib) and runtime components (.dll) are provided. The naming scheme for these files is defined as nvToolsExt64_<version>.{dll|lib}.

Function Calls

All NVTX API functions start with an nvtx name prefix and may end with one out of the three postfixes A, W, or Ex. NVTX functions with such postfix exist in multiple variants, performing the same core functionality with different parameter encodings. Depending on the version of the NVTX library, available encodings may include ACSII (A), Unicode (W), or event structures (Ex).

Return Values

Some of the NVTX functions are defined to have return values. For example, the nvtxRangeStart functions returns a unique range identifier or nvtxRangePush/nvtxRangePop functions outputs the current stack level. It is recommended not to use the returned values as part of conditional code in the instrumented application. The returned values can differ between various implementations of the NVTX library and, consequently, having added dependencies on the return values might work with one tool, but may fail with another.

C++ Wrapper Library

The NVTX API is a straight C API. The nvtxMultithreaded sample contains an example for a C++ wrapper. It is recommended to use such a customized wrapper layer on top of the raw API to simplify inclusion of NVTX in your application.

Another advantage of a wrapper library is that it hides any changes to the base API from the end user’s program. So if one API call is changed, the developer only needs to update the wrapper library code, rather than go through the entire code and change every reference.

2.3. Events

Markers are used to describe events that occurred at a specific time during the execution of an application, while ranges detail the time span in which they occurred. This information is presented alongside all of the other captured data, which makes it easier to understand the collected information.

2.3.1. NVTX Version 0

The first version of the NVTX C API only allowed the caller to specify a message. The API supports both ASCII and Unicode variants of the API.

2.3.2. NVTX Version 1

The second version of the NVTX C API added support for per-event attributes. Attributes include category, color, message, and payload. All attributes are optional.

Event Attributes Structure

This structure is used to describe the attributes of an event. The layout of the structure is defined by a specific version of the tools extension library and can change between different versions of the Tools Extension library.

Attributes

Markers and ranges can use attributes to provide additional information for an event or to guide the tool's visualization of the data. Each of the attributes is optional and if left unspecified, the attributes fall back to a default value.

To specify any attribute other than the text message, the Ex variant of the function must be called.

Message

The message field can be used to specify an optional string. The caller must set both the messageType and message fields. The default value is NVTX_MESSAGE_UNKNOWN.

Code Sample: 

// VALID NVTX_MESSAGE_TYPE_ASCII
nvtxEventAttributes_t eventAttrib1 = {0}; 
eventAttrib1.version = NVTX_VERSION; 
eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib1.message.ascii = __FUNCTION__ ":ascii"; 
nvtxMarkEx(&eventAttrib1); 
DELAY();
// VALID NVTX_MESSAGE_TYPE_UNICODE
nvtxEventAttributes_t eventAttrib2 = {0}; 
eventAttrib2.version = NVTX_VERSION; 
eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib2.messageType = NVTX_MESSAGE_TYPE_UNICODE; 
eventAttrib2.message.unicode = __FUNCTIONW__ L ":unicode \u2603 snowman"; 
nvtxMarkEx(&eventAttrib2); 
DELAY();
        

Category

A category attribute is a user-controlled ID that can be used to group events. The tool may use category IDs to improve filtering, or for grouping events. The functions nvtxNameCategoryA or nvtxNameCategoryW can be used to name a category. The default value is 0.

Code Sample:

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib.category = 1; 
 
// specifying message to help identify this event in the tool.
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__; 
nvtxMarkEx(&eventAttrib); 

// Categories can be named using nvtxNameCategory{A,W}().
nvtxNameCategoryA(1, __FUNCTION__);

Color

The color attribute is used to help visually identify events in the tool. The caller must set both the colorType and color field.

Code Sample:

// valid specification of color 
nvtxEventAttributes_t eventAttrib1 = {0}; 
eventAttrib1.version = NVTX_VERSION; 
eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib1.colorType = NVTX_COLOR_ARGB; 
eventAttrib1.color = COLOR_RED;
  
// specifying message to help identify this event in the tool. 
eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib1.message.ascii = __FUNCTION__ ":valid color"; 
nvtxMarkEx(&eventAttrib1);  

// default color
nvtxEventAttributes_t eventAttrib2 = {0}; 
eventAttrib2.version = NVTX_VERSION; 
eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;


// specifying message to help identify this event in the tool. 
eventAttrib2.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib2.message.ascii = __FUNCTION__ ":default color"; 
nvtxMarkEx(&eventAttrib2);

Payload

The payload attribute can be used to provide additional data for markers and ranges. Range events can only specify values at the beginning of a range. The caller must specify valid values for both payloadType and payload.

Code Sample:

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 

eventAttrib.payloadType = NVTX_PAYLOAD_TYPE_UNSIGNED_INT64; 
eventAttrib.payload.llValue = 0; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":UNSIGNED_INT64 = 0"; 
nvtxMarkEx(&eventAttrib);

Structure

Initializing Attributes

The caller should always perform the following three tasks when using attributes:

  • Zero the structure;

  • Set the version field;

  • Set the size field.

Zeroing the structure sets all the event attributes types and values to the default value. The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structure.

Versioning

It is recommended that the caller use one of the following to methods to initialize the event attributes structure:

Version Safe

The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structures. This example shows how to initialize the structure for forwards compatibility.

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = ::COLOR_YELLOW; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__; 
nvtxMarkEx(&eventAttrib); 

// This can be done using C99 designated initializers.
nvtxEventAttributes_t eventAttrib2 = 
{ 
     .version = NVTX_VERSION,                // version 
     .size = NVTX_EVENT_ATTRIB_STRUCT_SIZE,  // size 
     .colorType = NVTX_COLOR_ARGB,         // colorType 
     .color = COLOR_YELLOW,                  // color 
     .messageType = NVTX_MESSAGE_TYPE_ASCII, // messageType
     .message = __FUNCTION__ ":Designated Initializer" 
}; 
nvtxMarkEx(eventAttrib2);      

Version Specific

This example shows how to initialize the structure to a specific version of the library.

nvtxEventAttributes_v1 eventAttrib = {0}; 
eventAttrib.version = 1; 
eventAttrib.size = (uint16_t)(size of(nvtxEventAttributes_v1)); 
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_MAGENTA; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__; 
nvtxMarkEx(&eventAttrib); 

// This can be done using ordered initialization. 
nvtxEventAttributes_v1 eventAttrib2 = 
     { 
          1,                                          // version
         (uint16_t)(sizeof(nvtxEventAttributes_v1)), // size 
          0,                                          // category 
          NVTX_COLOR_ARGB,                            // colorType
          COLOR_CYAN,                                 // color
          NVTX_PAYLOAD_TYPE_UNSIGNED_INT64,           // payloadType 
          1,                                          // payload 
          NVTX_MESSAGE_TYPE_ASCII,                    // messageType
          __FUNCTION__ ":Ordered Initialization"// message 
     };
nvtxMarkEx(&eventAttrib2); 

If the caller uses Method 1 it is critical that the entire binary layout of the structure be configured to 0 so that all fields are initialized to the default value. The caller should either use both NVTX_VERSION and NVTX_EVENT_ATTRIB_STRUCT_SIZE (Method 1) or use explicit values and a versioned type (Method 2). Using a mix of the two methods will likely cause either source level incompatibility or binary incompatibility in the future.

2.3.3. Markers

A marker is used to describe a single point in time.

API Reference

nvtxMarkEx  

A marker can contain a text message or specify additional information using the event attributes structure. These attributes include a text message, color, category, and a payload. Each of the attributes is optional and can only be sent out using the nvtxMarkEx function. If nvtxMarkA or nvtxMarkW are used to specify the marker, or if an attribute is unspecified, then a default value will be used.

Parameters:

eventAttrib - The event attribute structure defining the marker's attribute types and attribute values.

nvtxMarkA

nvtxMarkW

A marker created using nvtxMarkA or nvtxMarkW contains only a text message.

Parameters:

message - The message associated to this marker event.

Code Example

// nvtxMark{A,W} 
nvtxMarkA(__FUNCTION__ ":nvtxMarkA"); 
nvtxMarkW(__FUNCTIONW__ L":nvtxMarkW"); 

// nvtxMarkEx 
// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};

// set the version and the size information
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
 
// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_RED; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxMarkEx"; 
nvtxMarkEx(&eventAttrib);

2.3.4. Range Push/Pop

Push/Pop ranges are an excellent way to track nested time ranges which occur on a CPU thread. The duration of each range is defined by the corresponding pair of nvtxRangePush and nvtxRangePop calls in the application's source code. Nested ranges are handled automatically on a per-CPU thread basis, and no special developer code is necessary.

API Reference

nvtxPushEx   

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

eventAttrib - The event attribute structure defining the ranges attribute types and attribute values.

nvtxPushA

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

message - The event message associated to this range event.

nvtxPushW

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

message - The event message associated to this range event.

nvtxRangePop   

Marks the end of a nested range. If an error occurs, a negative value is returned on the current thread.

Code Example

// nvtxRangePush{A,W}
nvtxRangePushA(__FUNCTION__ ":nvtxRangePushA"); 
nvtxRangePop(); 
nvtxRangePushW(__FUNCTIONW__ L":nvtxRangePushW"); 
nvtxRangePop();

// nvtxRangePushEx 
// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};   

// set the version and the size information 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;  

// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_GREEN; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangePushEx"; 
nvtxRangePushEx(&eventAttrib); 
nvtxRangePop();

2.3.5. Range Start/End

Start/End ranges are used to denote a time span; however, they expose arbitrary concurrency (not just nesting), and the start of a range can occur on a different thread than the end. For the correlation of a start/end pair, a unique correlation ID is created that is returned from nvtxRangeStart, and is then passed into nvtxRangeEnd.

API Reference

nvtxStartEx

Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events.

Parameters:

eventAttrib - The event attribute structure defining the ranges attribute types and attribute values.

nvtxStartAnvtxStartW

Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events.

Parameters:

message - The event message associated to this range event.

nvtxRangeEnd   

Marks the end of a range.

Parameters:

id - The correlation ID returned from a nvtxRangeStart call.

Code Example

// nvtxRangeStart{A,W}
nvtxRangeId_t id1 = nvtxRangeStartA(__FUNCTION__ ":nvtxRangeStartA"); 
nvtxRangeEnd(id1);
nvtxRangeId_t id2 = nvtxRangeStartW(__FUNCTIONW__ L":nvtxRangeStartW"); 
nvtxRangeEnd(id2); 

// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};

// set the version and the size information 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;

// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB;
eventAttrib.color = COLOR_BLUE; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangeStartEx"; 
nvtxRangeId_t id3 = nvtxRangeStartEx(&eventAttrib); 
nvtxRangeEnd(id3);

// overlapping events 
// re-use eventAttrib 
eventAttrib.message.ascii = __FUNCTION__ ":Range 1";
nvtxRangeId_t r1 = nvtxRangeStartEx(&eventAttrib); 
eventAttrib.message.ascii = __FUNCTION__ ":Range 2"; 
nvtxRangeId_t r2 = nvtxRangeStartEx(&eventAttrib); 
nvtxRangeEnd(r1); 
nvtxRangeEnd(r2);

2.4. Overview of Resource Naming

2.4.1. System/NVTX

Categories and threads are used to group sets of events. Each category is identified through a unique ID; that ID is passed into any of the aforementioned marker/range events in order to assign that event to a specific category. The following API calls can be used to assign a name to a category ID.

API Reference

nvtxNameCategoryA   

nvtxNameCategoryW   

Allows the user to assign a name to a category ID.

Parameters:

category - The category ID to name.

message - The name of the category.

nvtxNameOsThreadA  

nvtxNameOsThreadW  

Allows the user to name an active thread of the current process. If an invalid thread ID is provided, or a thread ID from a different process is used, the behavior of the tool is implementation-dependent.

Parameters:

threadId - The ID of the thread to name.

message - The name of the thread.

Code Example

nvtxNameCategory(1, "Memory Allocation"); 
nvtxNameCategory(2, "Memory Transfer"); 
nvtxNameCategory(3, "Memory Object Lifetime"); 

nvtxNameOsThread(GetCurrentThreadId(), "MAIN_THREAD");

2.4.2. CUDA Resources

CUDA devices, context, and streams can be named with the nvtxName-prefixed functions defined in the nvToolsExtCuda header. Each of these functions combines the object handle and the name that should be assigned to the object.

API Reference

nvtxNameCuDeviceA

nvtxNameCuDeviceW

Allows the user to associate a CUDA device with a user-provided name.

Parameters:

device — The handle of the CUDA device to name.

name — The name of the CUDA device.

nvtxNameCuContextA  

nvtxNameCuContextW   

Allows the user to associate a CUDA context with a user-provided name.

Parameters:

context — The handle of the CUDA context to name.

name — The name of the CUDA context.

nvtxNameCuStreamAnvtxNameCuStreamA

Allows the user to associate a CUDA stream with a user-provided name.

Parameters:

stream — The handle of the CUDA stream to name.

name — The name of the CUDA stream.

Code Example

CUdevice device = 0;
CUcontext context;

cuCtxCreate(&context, 0, device);

nvtxNameCuContextA(context, "Context1");
nvtxNameCuDeviceA(device, "Device0");

2.4.3. OpenCL Resources

  Note:  

OpenCL support in NVIDIA Nsight Visual Studio Edition has been deprecated and will be removed in a future release.

The functions used with CUDA resources can also provide the very same functionality to name OpenCL resources. The namable resources in this case include: devices, context, command queues, memory objects, samplers, programs, and events.

API Reference

nvtxNameClDeviceA

nvtxNameClDeviceW

Allows the association of an OpenCL device with a user-provided name.

Parameters:

device — The handle of the OpenCL device to name.

name — The name of the OpenCL device.

nvtxNameClContextA

nvtxNameClContextW

Allows the association of an OpenCL context with a user-provided name.

Parameters:

context — The handle of the OpenCL context to name.

name — The name of the OpenCL context.

nvtxNameClCommandQueueA  

nvtxNameClCommandQueueW  

Allows the association of an OpenCL command queue with a user-provided name.

Parameters:

command_queue — The handle of the OpenCL command queue to name.

name — The name of the OpenCL command queue.

nvtxNameClMemObjectA

nvtxNameClMemObjectW

Allows the association of an OpenCL memory object with a user-provided name.

Parameters:

memobj — The handle of the OpenCL memory object to name.

name — The name of the OpenCL memory object.

nvtxNameClSamplerA

nvtxNameClSamplerW

Allows the association of an OpenCL sampler with a user-provided name.

Parameters: sampler — The handle of the OpenCL sampler to name. name — The name of the OpenCL sampler.

nvtxNameClProgramA

nvtxNameClProgramW

Allows the association of an OpenCL program with a user-provided name.

Parameters:

program — The handle of the OpenCL program to name.

name — The name of the OpenCL program.

nvtxNameClEventA

nvtxNameClEventW

Allows the association of an OpenCL event with a user-provided name.

Parameters:

event — The handle of the OpenCL event to name.

name — The name of the OpenCL event.

Code Example

cl_context context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, 0, 0, 0); cl_command_queue queue = clCreateCommandQueue(context, NULL, 0, NULL); nvtxNameClContextA(context, "Context1"); nvtxNameClCommandQueueA(queue, "Queue0");  

2.5. Adding NVTX to a Project

The NVTX API is installed by the NVIDIA Nsight VSE "host" installer (by default) into the following location:

C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\nvToolsExt

Both the header files and the library files themselves (.lib, .dll), are located underneath this path.

By default, the NVIDIA Nsight VSE installer will set up the environment variable NVTOOLSEXT_PATH to point to the aforementioned location that matches the system's bits.

C++ Project

In order to compile your project with NVTX support in Visual Studio, use the following steps to setup your project accordingly:

  1. Open the project properties dialog.

  2. Navigate to Configuration Properties > C/C++ > General.

    Add the following path to the Additional Include Directories: $(NVTOOLSEXT_PATH)\include

  3. Navigate to Configuration Properties > Linker > General.

    Add the following path to the Additional Library Directories: $(NVTOOLSEXT_PATH)\lib\$(Platform)

  4. Navigate to Configuration Properties > Linker > Input.

    Add nvToolsExt64_1.lib (according to your system specifications), to the Additional Dependencies.

CUDA (.cu file)

In case you use NVTX to annotate code in .cu files, please also make sure the following configuration is setup (this is in addition to the steps discussed in the previous section):

  1. Open the project properties dialog.

  2. Navigate to Configuration Properties > CUDA C/C++ > Common.

    Add the following path to the Additional Include Directories: $(NVTOOLSEXT_PATH)\include

Copying NVTX to Project

It is recommended that you copy the NVTX headers and library files into your own source tree prior to integrating this API into your application. By doing this, you will ensure that your application does not require NVIDIA Nsight VSE to be installed, in order for your application to build. The NVTX .dll has no direct dependencies on CUDA, DirectX, or other external libraries.

Once you have placed NVTX into your source tree, add a path to the NVTX headers into your include path, and include nvToolsExt.h into any CPU code source files. You may then begin to use the NVTX API calls as you wish, in order to annotate your application's runtime behavior.

When linking, you may either link using the stub .lib provided with NVTX, or use a LoadLibrary call to load the .dll directly.

Deploying NVTX

The NVTX .dll is not installed into c:\Windows\System32 or another global location. Instead, make sure to deploy the .dll with your application. One common way to do this in Visual Studio is to copy the NVTX .dll into a directory which contains the application's executable, using a Custom Post-Build Step.

  Warning!  

Do not rename the .dll in any way. Renaming the library will affect how NVIDIA Nsight VSE interacts with the library to collect data.

3. NVTX Implementation

Analysis Tools

NVTX data appears in several different areas of an Analysis Report, including in the Summary Report, Detailed Reports, and Timeline.

3.1. Analysis Activity

NVTX API calls are only supported by the NVIDIA Nsight VSE Analysis Tools (Nsight Menu > Start Performance Analysis).

To Configure a Trace Activity to Capture NVTX Data

  1. Create a new Analysis Activity (Nsight Menu > Start Performance Analysis).

  2. From the Activity Type area, select one of the following:

    1. Trace Application;

    2. Trace Process Tree;

  3. In the Trace Settings area, select the Tools Extension checkbox.

    1. Check the Markers and Ranges sub-option.

    2. Check the Resource Naming sub-option.

  4. Choose any other options you would like to trace, and then run the analysis activity.

To Capture NVTX Data

  1. Once finished configuring the activity to capture NVTX data, optionally choose any other domains or sub-options you would like to trace.

  2. Launch your application using the launch controls at the bottom of the activity page in order to run your application and capture all the specified data, including the NVTX events.

3.2. Analysis Report

3.2.1. Timeline Report

The NVTX markers and ranges are displayed in two different areas of the row hierarchy. This allows the thread to be presented:

  • Per-thread, and

  • Per-category.

Besides the Timeline Report, you can also select the Tools Extension Events report to view the data from a different angle.

Tools Extensions Thread Rows

The per-thread data is located in the rows at \Processes\<Process>\<Thread>\Tools Extension.

The parent row includes all nvtxMark events specified on the thread. The child row, "Push/Pop Ranges," contains a nvtxRange{Push, Pop} stack for the thread. The nvtxRange{Start, End} events are not displayed per-thread, as the start and end events can occur on different threads.

Tools Extension Process Rows

The per-category events are displayed as children rows of the \Processes\<Process>\Tools Extension row.

The per-category rows \Processes\<Process>\Tools Extension\<Category> contain the nvtxMark and nvtxRange{Start, End} events associated with the category. For more information on how to associate an event with a category, and how to name a category, see the "NVTX Event Attributes" section of NVTX Library. Each category row consists of multiple stacked range graphs. The timeline view will attempt to show the minimal number of Range rows. The ranges are not displayed as a stack.

The Tools Extension Table Page

The Tools Extension Events Page provides a table view of the events. This table contains every NVTX event that was sent during the capture period. The user can use sorting and filtering to analyze the data.

3.2.2. Markers

The data is displayed in the 'Tools Extension Events' report table as illustrated below.

The marker data is also displayed in the Timeline Report.

3.2.3. Push/Pop Ranges

The data is displayed in the report tables in a hierarchical way, resembling the nesting structure of the push/pop API calls. Each range is shown as a single row entry. Parent Push/Pop ranges fully enclose their child ranges.

To allow the user to easily view the hierarchy of events, a color gradient illustrates how far down on the hierarchy a node is located. The report page options allow the user to quickly expand or collapse all nodes in the hierarchy.

There are also a few keyboard shortcuts to quickly navigate through the hierarchy. These include the following: 

Cursor + right-arrow Expands the currently selected row by one level (for a newly-opened report), or to its previous state (if the child rows below the selected row were previously expanded).
Cursor + left-arrow Collapses the currently selected row (if it has any expanded child rows).
  * Expands the currently selected row as well as all child rows.
  / Collapses the entire sub-tree under the selected row.

  Note:  

The collapse/expand state per node is not persisted right now. That is, if you navigate away from the page and come back, the hierarchy starts out as completely collapsed. This will be addressed in a future version of NVIDIA Nsight VSE.

The ranges also appear on the Timeline as a child row of the corresponding thread.

3.2.4. Start/End Ranges

Start/End ranges are displayed in the report table pages as shown below. Each row represents an individual range.

Start/End ranges can also be visualized on a per-process basis on the timeline. The start/end ranges are grouped by category as well. The split hierarchy for the NVTX rows is due to the nature of the events itself, i.e. push/pop ranges are tracked per-thread, while start/end ranges are per-process.

3.2.5. Event Attributes

Message

The message field can be used to specify an optional string. As shown below, the results can be seen both in the Tools Extension Events table, as well as on the timeline report.

Categories

Name an NVTX category with a string. Each category is defined by a unique ID, and that ID is passed into any markers, Push/Pop, or Start/End events in order to note that those events are part of a particular category. The category names show up in the report table as well as the Timeline Report alongside the category ID.

Color

The color attribute helps you to visually identify events in the tool. The results can be viewed in the Tools Extension Events and Timeline reports, shown below.

Payload

The payload field provides additional information for markers and ranges. The results are displayed in the Tools Extension Events and Timeline reports, shown below.

3.2.6. Range Statistics

NVIDIA Nsight VSE has made further analysis data available for NVTX push/pop ranges via the Range Statistics detail pane. For the selected range in focus, the profile information details on any captured API usage that happened during the range’s life span. In addition, statistics for any GPU workload, which were spawned due to an API call made during the time span of the target range, are provided. With the overall API usage and the dependent GPU workload for each push/pop range at hand, NVTX push/pop ranges can be used to efficiently pinpoint code sections with increased resource utilization.

API Statistics

For each API domain, actively used during a time span of the target range, the range statistics table provides the total number of API calls made (API Call Count) as well as the sum of time spend to execute all those calls (API Call Duration). Both values are reported in four different ways:

  • Total: Accounts for all API calls made during the time span of the target push/pop range, independent of its child ranges.

  • Total %: Percentage with respect to the total value for the range, such as the overall duration of the push/pop range or the total API calls made during that duration.

  • Self: Accounts for the API calls made during the times span of the target push/pop range that is not overlapping with any of the range’s children.

  • Self %: Percentage with respect to the data that is not overlapping with any of the range’s child nodes.

GPU Statistics

In order to have a GPU workload event being attributed to a push/pop range, the corresponding API call that issued the GPU workload needs to be captured and this API call has to take place during the life time of the selected target range. The statistical data presented for those GPU events are the overall number of GPU workloads executed (GPU Work Count), the overall time at least one GPU workload was in flight (GPU Activity), and the overall time multiple GPU workloads were executed on the GPU (GPU Work Overlap). All performance values are reported in the four different variants as described in the previous section.

3.2.7. Resource Naming

Thread Name

The user-provided thread name is used to annotate the label of the corresponding thread row. All report pages have a new column called Start Thread Name and End Thread Name that will display the name as well. This column is hidden by default, but can be enabled using the column chooser (simply right-click on a column header).

 

4. NVTXT File Extension

Introduction

The NVIDIA Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating an application. Emitted NVTX annotations are captured and visualized in NVIDIA Nsight on the timeline and custom report pages. In some cases the C-based API might not be applicable or already existing logging infrastructures are difficult to output annotations through the NVTX API.

To address that, NVIDIA Nsight VSE introduces NVTXT, a text version of the Tools Extension SDK. It is based on human-readable *.nvtxt files that can be added to an analysis report to annotate a trace capture. It supports the same marker and ranges events (Markers, Push/Pop Ranges, Start/End Ranges), and the annotations are visualized the same way on the timeline as NVTX annotations. It is possible to use NVTX and NVTXT at the same time.

4.1. NVTXT File Format

An NVTXT file is a plain text file with an .nvtxt file extension and can be opened and read by any text editor. Each line contains either one NVTXT instruction or a comment.

There are 3 kinds of NVTXT instructions available:

  • Variable Assignments

  • Command Definitions

  • Command Calls

A comment line begins with a pound sign (#). All characters on that line will be treated as comments and will consequently be ignored by the parser.

4.2. Variable Assignments

Variables allow referencing values by name in subsequent instructions. Variables are valid from the point they get defined first until the end of the NVTXT file. Their value remains constant from the initial assignment up to either, the end of the file or the next assignment to the same variable.

NVTXT supports 2 kinds of variables: Strings and Integers.

In order to be valid, a String must have one of the following two characteristics:

  • It must be delimited by single-quotes (') or double-quotes ("), in which case it can contain any valid character between these delimiters (except a new line). Note that all characters are escaped between these delimiters.

  • It must consist solely of letters, digits, and the '_' (underscore), and not begin with a digit.

In order to be valid, an Integer must have one of the following two characteristics:

  • It must consist solely of digits and be a valid Int64. (i.e., - 9,223,372,036,854,775,808 < x < 9,223,372,036,854,775,807). It can therefore begin with a dash sign (-) to be treated as a negative integer.

  • It must begin with 0x or 0X, followed by digits of letters between A and F (or a and f) to be treated as an hexadecimal number.

As an example, all of the following variables are considered to be valid:

  • Something
  • _something
  • s0m3th1n6
  • "Y3t 4n0th3r th1n6"
  • 123456789
  • 0xCafE42
  • -424242

Assignment Instruction

A variable assignment takes this form:

Variable[white-space]=[white-space]value

The left-operand of the assignment is the variable name and it must consist solely of letters, digits, and the '_' (underscore) and cannot begin with a digit.

Variable Expansion

A variable can be expanded anywhere (except between single and double quotes) to retrieve its value. The variable expansion is done during the lexing stage.

To expand a variable, place a dollar sign ($) before a previously assigned variable name.

Note that trying to expand an undefined variable will result in a lexing error.

4.3. Command Definitions

NVTXT contains a finite set of commands, with each one containing a predefined list of arguments. Some arguments can be made static, which means that their value will not be read from the command call, but instead from a variable with the same name.

A command definition always begins with the at sign (@) prepended to a valid command name, and is followed by a series of valid argument names, delimited by a comma (,). For example: 

@CommandName,[whitespace]Property1,[whitespace]Property2,[whitespace]Property3

A command definition is valid for all subsequent calls of this command. A command definition can be redefined multiple times in a single NVTXT file.

For example, the RangeStartEnd command's default format is as follows:

@RangeStartEnd, Start, End, TimeBase, ProcessId, ThreadId, CategoryId, Color, Message, Payload

4.3.1. Supported Commands

The following defines all available commands supported in NVTXT. Optional parameters are denoted in italics.

  • Marker — Adds a marker in the timeline at <Time>, with <Color> as color, <Message> as label and <Payload> as payload.

    Marker, Time, TimeBase, ProcessId, ThreadId, CategoryId, Color, Message, Payload
  • RangePush — Opens a range in the timeline beginning at <Time>, with <Color> as color, <Message> as label and <Payload> as payload. Calls to RangePush need to be paired with corresponding calls to RangePop.

    RangePush, Time, TimeBase, ProcessId, ThreadId, CategoryId, Color, Message, Payload
  • RangePop — Closes a range in the timeline ending at <Time> and starting with the corresponding RangePush command.

    RangePop, Time, TimeBase, ProcessId, ThreadId
  • RangeStartEnd — Adds range in the timeline between <Start> and <End>, with <Color> as color, <Message> as label and <Payload> as payload. Multiple Start/End ranges can overlap each other.

    RangeStartEnd, Start, End, TimeBase, ProcessId, ThreadId, CategoryId, Color, Message, Payload
  • NameCategory — Associates the name given in <Name> to the category with ID <CategoryId>.

    NameCategory, CategoryId, Name
  • AddChildCategory — Defines a parent-child-relationship between two categories. This information can be used by tools to present the input data in a hierarchical way on the timeline or in report pages. It is invalid to define circular relationships between categories.

    AddChildCategory, ParentCategoryId, CategoryId
  • NameOsThread — Provides a name to <ThreadId> thread on <ProcessId> process.

    NameOsThread, ProcessId, ThreadId, Name
  • NameProcess — Provides a name to <ProcessId> process.

    NameProcess, ProcessId, Name
  • SetFileDisplayName — Sets the display name to the current NVTXT file. This can be used by tools instead of the actual file name.

    SetFileDisplayName, Name

4.3.2. Supported Arguments

The arguments to the commands are defined in the following list. The type of the argument is specified in square brackets. Multiple brackets denote distinct options to define the argument value.

  • Start, End, Time

    [Integer] — A timestamp.

  • TimeBase

    [String] — The time base used for the timestamps, valid values are:

    • Qpc — The high-resolution performance counter provided by the QueryPerformanceCounter function on Windows systems.

    • Rdtsc — The processor time stamp in clock cycles since the last reset. Exposed by the Visual Studio compilers through the __rdtsc intrinsic.

    • FileTime — The number of 100-nanosecond intervals elapsed since midnight, January 1, 1601 C.E. UTC. The File Time can be retrieved by calling the GetSystemTimeAsFileTime function or the DateTime.ToFileTime method.

  • ProcessId

    [Integer] — ID of the Process.

  • ThreadId

    [Integer] — ID of the Thread.

  • CategoryId, ParentCategoryId

    [Integer] — ID of the Category.

  • Color

    [Integer] — ARGB value as an integer.

    [String] — Name of the color. (The list of known names is equivalent to: http://msdn.microsoft.com/en-us/library/system.drawing.color.aspx.)

    [String] — ARGB value as an hexadecimal value ("0xFF004488FF").

  • Message

    [String] — Label of the marker/range.

  • Payload

    [Integer] — Payload component of the range/marker.

  • Name

    [String] — Name of the Thread/Process/Category/File.

4.3.3. Example

Below you will find an example of defining the RangeStartEnd in a custom way. The arguments where the ProcessId, ThreadId, CategoryId, Color, and TimeBase are defined as static variables. That way, those arguments do not have to be repeated for every call to this command.

@RangeStartEnd, Start, End, Message
ProcessId = 1844
ThreadId = 4880
CategoryId = 1
Color = Blue
TimeBase = Qpc
RangeStartEnd, 8236719005, 8236928073, "My Message"

4.3.4. Command Calls

A previously defined command can be issued by Command Calls. A command call always begins with a Command Name followed by a list of values corresponding to the command’s definition. Every value must have the right type and there must be the exact amount of arguments.

CommandName,[whitespace]ValueForProperty1, [whitespace]ValueForProperty2

4.4. NVTXT Report Files

NVTXT files that are collocated with NVIDIA Nsight VSE's capture output get automatically loaded when a report is opened. Multiple NVTXT files can be used to annotate a report. However, each input file is handled separately; sharing of IDs, Variables, or Command Definitions across NVTXT files is not supported.

4.4.1. File Location

The NVTXT files can be placed at two locations in the report hierarchy to be properly loaded with a report. Any arbitrary number of NVTXT files can co-exist and be loaded at the same time. Note that the order in which they are loaded is not defined. The two locations are:

1. Capture Directory

Like any file loaded alongside the report, the NVTXT file can be located in the capture folder.

For example, if the Report Directory is set to C:\temp, then the NVTXT file will be located in C:\temp\[project name]\[project name]_Capture_000\.

2. Session Directory

In the Analysis tab of the Nsight Options menu, the user can choose to enable NVTXT file loading from the session folder. Note that the default value is set to True.

By doing so, NVTXT files can also be located in the Session directory. For example, if the Report Directory is set to C:\temp, the NVTXT file will be placed in C:\temp\[project name]\.

To avoid the need to manually copy NVTXT files into the report directory structure after a capture is completed, NVIDIA Nsight VSE provides an environment variable that points to the current session directory. This environment variable, called NSIGHT_ANALYSIS_SESSION_OUTPUT_PATH, is automatically set for the target process and remains valid during the lifetime of a session.

Note that for a remote session, every NVTXT file present in the session directory will be synchronized at the end of every capture session. Therefore, if a session contains multiple captures, old files are likely overridden. It is also the user’s responsibility to properly close any handles to the NVTXT files located in NSIGHT_ANALYSIS_SESSION_OUTPUT_PATH to make them available for file synchronization between the target and host machines.

4.5. NVTXT Report Output

Report Summary

An NVTXT file is treated like any other loadable file and will be present in the Report Summary. If it contains errors, a warning message will be shown.

For more information about the errors, you can follow the link to the Session Summary, where all of the lexing, parsing, and loading error messages will be displayed for each of the loaded NVTXT files.

Timeline

NVTXT data will be displayed on the Timeline in the very same way that NVTX data is shown. However, there are two notable differences:

  • On Thread level rows, the rows will be named Text Tools Extension rather than Tools Extension.
  • The Category display formatting shows the name of the NVTXT file alongside the CategoryId (or the display name, if applicable).

 

Notices

Notice

NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 2020.1 User GuideSend Feedback

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA-GDB, CUDA-MEMCHECK, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, NVIDIA DRIVE, NVIDIA DRIVE AGX, NVIDIA DRIVE Software, NVIDIA DRIVE OS, NVIDIA Developer Zone (aka "DevZone"), GRID, Jetson, NVIDIA Jetson Nano, NVIDIA Jetson AGX Xavier, NVIDIA Jetson TX2, NVIDIA Jetson TX2i, NVIDIA Jetson TX1, NVIDIA Jetson TK1, Kepler, NGX, NVIDIA GPU Cloud, Maxwell, Multimedia API, NCCL, NVIDIA Nsight Compute, NVIDIA Nsight Eclipse Edition, NVIDIA Nsight Graphics, NVIDIA Nsight Integration, NVIDIA Nsight Systems, NVIDIA Nsight Visual Studio Edition, NVLink, nvprof, Pascal, NVIDIA SDK Manager, Tegra, TensorRT, Tesla, Visual Profiler, VisionWorks and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.