Using NVTX with NVIDIA Nsight Visual Studio Edition


NVIDIA® Nsight™ Development Platform, Visual Studio Edition brings GPU Computing into Microsoft Visual Studio. You can build, debug, profile ,and trace heterogeneous compute and graphics applications using CUDA C/C++, OpenCL, DirectCompute, Direct3D, and OpenGL. NVIDIA Nsight tools extend the debugging and performance analysis capabilities of Visual Studio to support GPU computing.

Applications which integrate NVTX can use NVIDIA Nsight to capture and visualize events, code ranges, and resources.

1. SDK Overview

The SDK consists of the include files, pre-built stub libraries, DLLs, and several SDK samples.

The SDK is installed at the following location:

C:\Program Files\NVIDIA Corporation\nvToolsExt
bin
     nvToolsExt64_1.dll
include
     nvToolsExt.h
     nvToolsExtCuda.h
     nvToolsExtOpenCL.h
lib 
     nvToolsExt64_1.lib

 

The SDK includes two sample projects located at one of the following locations: 

C:\Program Files (x86)\Nsight Visual Studio Edition 5.5\Host\Samples\NsightSamples.zip

The SDK contains the following samples:

nvtxSimple Demonstrates how to use the NVTX API to generate marker and range events, and name OS Threads and Categories.
nvtxMultithreaded Demonstrates more advanced usages of the NVTX C API. Introduces two sample C++ wrappers that simplify use of the API.

2. API Overview

2.1 Files

The core NVTX API is defined in file nvToolsExt.h, whereas domain-specific extensions to the NVTX interface are exposed in separate header files. For example, see nvToolsExtCuda.h for CUDA-specific NVTX API functions.

The library (.lib) and runtime components (.dll) are provided. The naming scheme for these files is defined as nvToolsExt64_<version>.{dll|lib}.

2.2 Function Calls

All NVTX API functions start with an nvtx name prefix and may end with one out of the three postfixes A, W, or Ex. NVTX functions with such postfix exist in multiple variants, performing the same core functionality with different parameter encodings. Depending on the version of the NVTX library, available encodings may include ACSII (A), Unicode (W), or event structures (Ex).

2.3 Return Values

Some of the NVTX functions are defined to have return values. For example, the nvtxRangeStart functions returns a unique range identifier or nvtxRangePush/nvtxRangePop functions outputs the current stack level. It is recommended not to use the returned values as part of conditional code in the instrumented application. The returned values can differ between various implementations of the NVTX library and, consequently, having added dependencies on the return values might work with one tool, but may fail with another.

2.4 C++ Wrapper Library

The NVTX API is a straight C API. The nvtxMultithreaded sample contains an example for a C++ wrapper. It is recommended to use such a customized wrapper layer on top of the raw API to simplify inclusion of NVTX in your application.

Another advantage of a wrapper library is that it hides any changes to the base API from the end user’s program. So if one API call is changed, the developer only needs to update the wrapper library code, rather than go through the entire code and change every reference.

3. Events

Markers are used to describe events that occurred at a specific time during the execution of an application, while ranges detail the time span in which they occurred. This information is presented alongside all of the other captured data, which makes it easier to understand the collected information.

3.1 NVTX Version 0

This version of the NVTX C API only allowed the caller to specify a message. The API supports both ASCII and Unicode variants of the API.

3.2 NVTX Version 1

This version of the NVTX C API added support for per-event attributes. Attributes include category, color, message, and payload. All attributes are optional.

3.2.1 Event Attributes Structure

This structure is used to describe the attributes of an event. The layout of the structure is defined by a specific version of the tools extension library and can change between different versions of the Tools Extension library.

3.2.2 Attributes

Markers and ranges can use attributes to provide additional information for an event or to guide the tool's visualization of the data. Each of the attributes is optional and if left unspecified, the attributes fall back to a default value.

To specify any attribute other than the text message, the Ex variant of the function must be called.

3.2.2.1 Message

The message field can be used to specify an optional string. The caller must set both the messageType and message fields. The default value is NVTX_MESSAGE_UNKNOWN.

Code Sample: 

// VALID NVTX_MESSAGE_TYPE_ASCII
nvtxEventAttributes_t eventAttrib1 = {0}; 
eventAttrib1.version = NVTX_VERSION; 
eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib1.message.ascii = __FUNCTION__ ":ascii"; 
nvtxMarkEx(&eventAttrib1); 
DELAY();

// VALID NVTX_MESSAGE_TYPE_UNICODE
nvtxEventAttributes_t eventAttrib2 = {0}; 
eventAttrib2.version = NVTX_VERSION; 
eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib2.messageType = NVTX_MESSAGE_TYPE_UNICODE; 
eventAttrib2.message.unicode = __FUNCTIONW__ L ":unicode \u2603 snowman"; 
nvtxMarkEx(&eventAttrib2); 
DELAY();
3.2.2.2 Category

A category attribute is a user-controlled ID that can be used to group events. The tool may use category IDs to improve filtering, or for grouping events. The functions nvtxNameCategoryA or nvtxNameCategoryW can be used to name a category. The default value is 0.

Code Sample:

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib.category = 1; 

// specifying message to help identify this event in the tool. eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; eventAttrib.message.ascii = __FUNCTION__; nvtxMarkEx(&eventAttrib); // Categories can be named using nvtxNameCategory{A,W}(). nvtxNameCategoryA(1, __FUNCTION__);
3.2.2.3 Color

The color attribute is used to help visually identify events in the tool. The caller must set both the colorType and color field.

Code Sample:

// valid specification of color 
nvtxEventAttributes_t eventAttrib1 = {0}; 
eventAttrib1.version = NVTX_VERSION; 
eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib1.colorType = NVTX_COLOR_ARGB; 
eventAttrib1.color = COLOR_RED;
  
// specifying message to help identify this event in the tool. 
eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib1.message.ascii = __FUNCTION__ ":valid color"; 
nvtxMarkEx(&eventAttrib1); 
// default color nvtxEventAttributes_t eventAttrib2 = {0}; eventAttrib2.version = NVTX_VERSION; eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; // specifying message to help identify this event in the tool. eventAttrib2.messageType = NVTX_MESSAGE_TYPE_ASCII; eventAttrib2.message.ascii = __FUNCTION__ ":default color"; nvtxMarkEx(&eventAttrib2);
3.2.2.4 Payload

The payload attribute can be used to provide additional data for markers and ranges. Range events can only specify values at the beginning of a range. The caller must specify valid values for both payloadType and payload.

Code Sample:

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 

eventAttrib.payloadType = NVTX_PAYLOAD_TYPE_UNSIGNED_INT64; 
eventAttrib.payload.llValue = 0; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":UNSIGNED_INT64 = 0"; 
nvtxMarkEx(&eventAttrib);
3.2.3 Structure
3.2.3.1 Initializing Attributes

The caller should always perform the following three tasks when using attributes:

Zeroing the structure sets all the event attributes types and values to the default value. The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structure.

3.2.3.2 Versioning

It is recommended that the caller use one of the following to methods to initialize the event attributes structure:

• Version Safe

The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structures. This example shows how to initialize the structure for forwards compatibility.

nvtxEventAttributes_t eventAttrib = {0}; 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE; 
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = ::COLOR_YELLOW; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__; 
nvtxMarkEx(&eventAttrib); 

// This can be done using C99 designated initializers.
nvtxEventAttributes_t eventAttrib2 = 
{ 
     .version = NVTX_VERSION,                // version 
     .size = NVTX_EVENT_ATTRIB_STRUCT_SIZE,  // size 
     .colorType = NVTX_COLOR_ARGB,         // colorType 
     .color = COLOR_YELLOW,                  // color 
     .messageType = NVTX_MESSAGE_TYPE_ASCII, // messageType
     .message = __FUNCTION__ ":Designated Initializer" 
}; 
nvtxMarkEx(eventAttrib2);      

• Version Specific

This example shows how to initialize the structure to a specific version of the library.

nvtxEventAttributes_v1 eventAttrib = {0}; 
eventAttrib.version = 1; 
eventAttrib.size = (uint16_t)(size of(nvtxEventAttributes_v1)); 
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_MAGENTA; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__; 
nvtxMarkEx(&eventAttrib); 


// This can be done using ordered initialization. nvtxEventAttributes_v1 eventAttrib2 =      {           1,   // version          (uint16_t)(sizeof(nvtxEventAttributes_v1)), // size           0,   // category           NVTX_COLOR_ARGB, // colorType           COLOR_CYAN, // color           NVTX_PAYLOAD_TYPE_UNSIGNED_INT64, // payloadType           1, // payload           NVTX_MESSAGE_TYPE_ASCII, // messageType           __FUNCTION__ ":Ordered Initialization"// message      }; nvtxMarkEx(&eventAttrib2);

If the caller uses Method 1 it is critical that the entire binary layout of the structure be configured to 0 so that all fields are initialized to the default value. The caller should either use both NVTX_VERSION and NVTX_EVENT_ATTRIB_STRUCT_SIZE (Method 1) or use explicit values and a versioned type (Method 2). Using a mix of the two methods will likely cause either source level incompatibility or binary incompatibility in the future.

3.3 Markers

A marker is used to describe a single point in time.

3.3.1 API Reference
nvtxMarkEx

A marker can contain a text message or specify additional information using the event attributes structure. These attributes include a text message, color, category, and a payload. Each of the attributes is optional and can only be sent out using the nvtxMarkEx function. If nvtxMarkA or nvtxMarkW are used to specify the marker, or if an attribute is unspecified, then a default value will be used.

Parameters:

eventAttrib - The event attribute structure defining the marker's attribute types and attribute values.

nvtxMarkA

nvtxMarkW

A marker created using nvtxMarkA or nvtxMarkW contains only a text message.

Parameters:

message - The message associated to this marker event.

3.3.2 Code Example
// nvtxMark{A,W} 
nvtxMarkA(__FUNCTION__ ":nvtxMarkA"); 
nvtxMarkW(__FUNCTIONW__ L":nvtxMarkW"); 

// nvtxMarkEx 
// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};

// set the version and the size information
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
 
// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_RED; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxMarkEx"; 
nvtxMarkEx(&eventAttrib);

3.4 Range Push/Pop

Push/Pop ranges are an excellent way to track nested time ranges which occur on a CPU thread. The duration of each range is defined by the corresponding pair of nvtxRangePush and nvtxRangePop calls in the application's source code. Nested ranges are handled automatically on a per-CPU thread basis, and no special developer code is necessary.

3.4.1 API Reference
nvtxPushEx

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

eventAttrib - The event attribute structure defining the ranges attribute types and attribute values.

nvtxPushA

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

message - The event message associated to this range event.

nvtxPushW

Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned.

Parameters:

message - The event message associated to this range event.

nvtxRangePop Marks the end of a nested range. If an error occurs, a negative value is returned on the current thread.
3.4.2 Code Example
// nvtxRangePush{A,W}
nvtxRangePushA(__FUNCTION__ ":nvtxRangePushA"); 
nvtxRangePop(); 
nvtxRangePushW(__FUNCTIONW__ L":nvtxRangePushW"); 
nvtxRangePop();

// nvtxRangePushEx 
// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};   

// set the version and the size information 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;  

// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB; 
eventAttrib.color = COLOR_GREEN; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangePushEx"; 
nvtxRangePushEx(&eventAttrib); 
nvtxRangePop();

3.5 Range Start/End

Start/End ranges are used to denote a time span; however, they expose arbitrary concurrency (not just nesting), and the start of a range can occur on a different thread than the end. For the correlation of a start/end pair, a unique correlation ID is created that is returned from nvtxRangeStart, and is then passed into nvtxRangeEnd.

3.5.1 API Reference
nvtxStartEx

Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events.

Parameters:

eventAttrib - The event attribute structure defining the ranges attribute types and attribute values.

nvtxStartA

nvtxStartW

Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events.

Parameters:

message - The event message associated to this range event.

nvtxRangeEnd

Marks the end of a range.

Parameters:

id - The correlation ID returned from a nvtxRangeStart call.

3.5.2 Code Example
// nvtxRangeStart{A,W}
nvtxRangeId_t id1 = nvtxRangeStartA(__FUNCTION__ ":nvtxRangeStartA"); 
nvtxRangeEnd(id1);
nvtxRangeId_t id2 = nvtxRangeStartW(__FUNCTIONW__ L":nvtxRangeStartW"); 
nvtxRangeEnd(id2); 

// zero the structure 
nvtxEventAttributes_t eventAttrib = {0};

// set the version and the size information 
eventAttrib.version = NVTX_VERSION; 
eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;

// configure the attributes.  0 is the default for all attributes.
eventAttrib.colorType = NVTX_COLOR_ARGB;
eventAttrib.color = COLOR_BLUE; 
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII; 
eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangeStartEx"; 
nvtxRangeId_t id3 = nvtxRangeStartEx(&eventAttrib); 
nvtxRangeEnd(id3);

// overlapping events 
// re-use eventAttrib 
eventAttrib.message.ascii = __FUNCTION__ ":Range 1";
nvtxRangeId_t r1 = nvtxRangeStartEx(&eventAttrib); 
eventAttrib.message.ascii = __FUNCTION__ ":Range 2"; 
nvtxRangeId_t r2 = nvtxRangeStartEx(&eventAttrib); 
nvtxRangeEnd(r1); 
nvtxRangeEnd(r2);

4. Overview of Resource Naming

4.1 System/NVTX

Categories and threads are used to group sets of events. Each category is identified through a unique ID; that ID is passed into any of the aforementioned marker/range events in order to assign that event to a specific category. The following API calls can be used to assign a name to a category ID.

4.1.1 API Reference

nvtxNameCategoryA

nvtxNameCategoryW

Allows the user to assign a name to a category ID.

Parameters:

category - The category ID to name.

message - The name of the category.

nvtxNameOsThreadA

nvtxNameOsThreadW

Allows the user to name an active thread of the current process. If an invalid thread ID is provided, or a thread ID from a different process is used, the behavior of the tool is implementation-dependent.

Parameters:

threadId - The ID of the thread to name.

message - The name of the thread.

4.1.2 Code Example
nvtxNameCategory(1, "Memory Allocation"); 
nvtxNameCategory(2, "Memory Transfer"); 
nvtxNameCategory(3, "Memory Object Lifetime"); 

nvtxNameOsThread(GetCurrentThreadId(), "MAIN_THREAD");

4.2 CUDA Resources

CUDA devices, context, and streams can be named with the nvtxName-prefixed functions defined in the nvToolsExtCuda header. Each of these functions combines the object handle and the name that should be assigned to the object.

4.2.1 API Reference

nvtxNameCuDeviceA

nvtxNameCuDeviceW

Allows the user to associate a CUDA device with a user-provided name.

Parameters:

device - The handle of the CUDA device to name.

name - The name of the CUDA device.

nvtxNameCuContextA

nvtxNameCuContextW

Allows the user to associate a CUDA context with a user-provided name.

Parameters:

context - The handle of the CUDA context to name.

name - The name of the CUDA context.

nvtxNameCuStreamA

nvtxNameCuStreamA

Allows the user to associate a CUDA stream with a user-provided name.

Parameters:

stream - The handle of the CUDA stream to name.

name - The name of the CUDA stream.

4.2.2 Code Example
CUdevice device = 0;
CUcontext context;

cuCtxCreate(&context, 0, device);

nvtxNameCuContextA(context, "Context1");
nvtxNameCuDeviceA(device, "Device0");

4.3 OpenCL Resources

The functions used with CUDA resources can also provide the very same functionality to name OpenCL resources. The namable resources in this case include: devices, context, command queues, memory objects, samplers, programs, and events.

4.3.1 API Reference

nvtxNameClDeviceA

nvtxNameClDeviceW

Allows the association of an OpenCL device with a user-provided name.

Parameters:

device - The handle of the OpenCL device to name.

name - The name of the OpenCL device.

nvtxNameClContextA

nvtxNameClContextW

Allows the association of an OpenCL context with a user-provided name.

Parameters:

context - The handle of the OpenCL context to name.

name - The name of the OpenCL context.

nvtxNameClCommandQueueA

nvtxNameClCommandQueueW

Allows the association of an OpenCL command queue with a user-provided name.

Parameters:

command_queue - The handle of the OpenCL command queue to name.

name - The name of the OpenCL command queue.

nvtxNameClMemObjectA

nvtxNameClMemObjectW

Allows the association of an OpenCL memory object with a user-provided name.

Parameters:

memobj - The handle of the OpenCL memory object to name.

name - The name of the OpenCL memory object.

nvtxNameClSamplerA

nvtxNameClSamplerW

Allows the association of an OpenCL sampler with a user-provided name.

Parameters:
sampler - The handle of the OpenCL sampler to name.
name - The name of the OpenCL sampler.

nvtxNameClProgramA

nvtxNameClProgramW

Allows the association of an OpenCL program with a user-provided name.

Parameters:

program - The handle of the OpenCL program to name.

name - The name of the OpenCL program.

nvtxNameClEventA

nvtxNameClEventW

Allows the association of an OpenCL event with a user-provided name.

Parameters:

event - The handle of the OpenCL event to name.

name - The name of the OpenCL event.

4.3.2 Code Example
cl_context context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, 0, 0, 0); 
cl_command_queue queue = clCreateCommandQueue(context, NULL, 0, NULL);
nvtxNameClContextA(context, "Context1");
nvtxNameClCommandQueueA(queue, "Queue0");

5. Adding NVTX to a Project

The NVTX API is installed by the NVIDIA Nsight “host” installer (by default) into the following location:

C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\nvToolsExt

Both the header files and the library files themselves (.lib, .dll), are located underneath this path.

By default, the NVIDIA Nsight installer will set up the environment variable NVTOOLSEXT_PATH to point to the aforementioned location that matches the system's bits.

5.1 C++ Project

In order to compile your project with NVTX support in Visual Studio, use the following steps to setup your project accordingly:

  1. Open the project properties dialog.
  2. Navigate to Configuration Properties > C/C++ > General.

    Add the following path to the Additional Include Directories: $(NVTOOLSEXT_PATH)\include

  3. Navigate to Configuration Properties > Linker > General.

    Add the following path to the Additional Library Directories: $(NVTOOLSEXT_PATH)\lib\$(Platform)

  4. Navigate to Configuration Properties > Linker > Input.

    Add nvToolsExt64_1.lib (according to your system specifications), to the Additional Dependencies.

5.2 CUDA (.cu file)

In case you use NVTX to annotate code in .cu files, please also make sure the following configuration is setup (this is in addition to the steps discussed in the previous section):

  1. Open the project properties dialog.
  2. Navigate to Configuration Properties > CUDA C/C++ > Common.

    Add the following path to the Additional Include Directories: $(NVTOOLSEXT_PATH)\include

5.3 Copying NVTX to Project

It is recommended that you copy the NVTX headers and library files into your own source tree prior to integrating this API into your application. By doing this, you will ensure that your application does not require NVIDIA Nsight to be installed, in order for your application to build. The NVTX .dll has no direct dependencies on CUDA, DirectX, or other external libraries.

Once you have placed NVTX into your source tree, add a path to the NVTX headers into your include path, and include nvToolsExt.h into any CPU code source files. You may then begin to use the NVTX API calls as you wish, in order to annotate your application's runtime behavior.

When linking, you may either link using the stub .lib provided with NVTX, or use a LoadLibrary call to load the .dll directly.

5.4 Deploying NVTX

The NVTX .dll is not installed into c:\Windows\System32 or another global location. Instead, make sure to deploy the .dll with your application. One common way to do this in Visual Studio is to copy the NVTX .dll into a directory which contains the application's executable, using a Custom Post-Build Step.

Warning: Do not to rename the .dll in any way. Renaming the library will affect how NVIDIA Nsight interacts with the library to collect data.


NVIDIA® GameWorks™ Documentation Rev. 1.0.200608 ©2014-2020. NVIDIA Corporation. All Rights Reserved.