NVTX Profiling Library APIs

This chapter describes the Fortran interfaces to the NVIDIA Tools Extension (NVTX) library. NVTX is a set of functions that a developer can use to provide additional information to tools, such as NVIDIA’s Nsight Systems performance analysis tool. NVTX functions are accessible from host code, but can be useful in marking and viewing time spans (ranges) of both host and device sections of an application.

The NVTX interfaces and definitions described in this chapter can be exposed by adding the line

use nvtx

to your program unit. A version of this module has been available through other means in the past, but this chapter documents the Fortran module now included in the NVIDIA HPC SDK. Since we are targeting the NVTX v3 API, a header-only C library, we have instantiated Fortran-callable wrappers and provide those in a library, libnvhpcwrapnvtx.[a|so]; linking requires the developer add -cudalib=nvtx to their link line, or explicitly add some form of -lnvhpcwrapnvtx.

This chapter is divided into three sections. The first describes the traditional Fortran NVTX interfaces which have been available previously. The second describes advanced functions which are now supported in the NVTX v3 API. The third shows a method which leverages the nvfortran -Minstrument option to automatically insert NVTX ranges across subprogram entry and exit.

Unless a specific kind is provided, the plain integer type used in the interfaces implies integer(4).

NVTX Basic Tooling APIs

This section describes the most basic Fortran interfaces to the NVIDIA Tools Extension (NVTX) library. These interfaces were first defined in blog posts and via a publicly available source repository. The simplest interfaces merely push and pop user-labeled, nested time ranges.

The StartRange/EndRange names were transposed from the advanced RangeStart/RangeEnd originally for ease-of-use. Both types can be used in the same program.

nvtxStartRange

This subroutine begins a simple labelled time span range using the NVTX library. The icolor argument is optional, and will map to one of many predefined colors. The ranges can be nested.

subroutine nvtxStartRange( label, icolor )
  character(len=*) :: label
  integer, optional :: icolor

nvtxEndRange

This subroutine terminates a simple labelled time span range initiated by nvtxStartRange. It takes no arguments.

subroutine nvtxEndRange()

NVTX Advanced Tooling APIs

This section describes the advanced Fortran interfaces to the NVIDIA Tools Extension (NVTX) library which target the NVTX v3 API.

NVTX Definitions and Derived Types

This section contains the definitions and data types used in the advanced Fortran interfaces to the NVIDIA Tools Extension (NVTX) library, v3 API.

! Parameters
integer, parameter :: NVTX_VERSION = 3
integer, parameter :: NVTX_EVENT_ATTRIB_STRUCT_SIZE = 48
! NVTX Status
enum, bind(C)
    enumerator :: NVTX_SUCCESS = 0
    enumerator :: NVTX_FAIL = 1
    enumerator :: NVTX_ERR_INIT_LOAD_PROPERTY = 2
    enumerator :: NVTX_ERR_INIT_ACCESS_LIBRARY = 3
    enumerator :: NVTX_ERR_INIT_LOAD_LIBRARY = 4
    enumerator :: NVTX_ERR_INIT_MISSING_LIBRARY_ENTRY_POINT = 5
    enumerator :: NVTX_ERR_INIT_FAILED_LIBRARY_ENTRY_POINT = 6
    enumerator :: NVTX_ERR_NO_INJECTION_LIBRARY_AVAILABLE = 7
end enum
! nvtxColorType_t, from nvToolsExt.h
type, bind(c) :: nvtxColorType
  integer(4) :: type
end type
type(nvtxColorType), parameter :: &
    NVTX_COLOR_UNKNOWN  = nvtxColorType(0), &
    NVTX_COLOR_ARGB     = nvtxColorType(1)
! nvtxMessageType_t, from nvToolsExt.h
type, bind(c) :: nvtxMessageType
  integer(4) :: type
end type
type(nvtxMessageType), parameter :: &
  NVTX_MESSAGE_UNKNOWN          = nvtxMessageType(0), &
  NVTX_MESSAGE_TYPE_ASCII       = nvtxMessageType(1), &
  NVTX_MESSAGE_TYPE_UNICODE     = nvtxMessageType(2), &
  NVTX_MESSAGE_TYPE_REGISTERED  = nvtxMessageType(3)
! nvtxPayloadType_t, from nvToolsExt.h
type, bind(c) :: nvtxPayloadType
  integer(4) :: type
end type
type(nvtxPayloadType), parameter :: &
  NVTX_PAYLOAD_UNKNOWN             = nvtxPayloadType(0), &
  NVTX_PAYLOAD_TYPE_UNSIGNED_INT64 = nvtxPayloadType(1), &
  NVTX_PAYLOAD_TYPE_INT64          = nvtxPayloadType(2), &
  NVTX_PAYLOAD_TYPE_DOUBLE         = nvtxPayloadType(3), &
  NVTX_PAYLOAD_TYPE_UNSIGNED_INT32 = nvtxPayloadType(4), &
  NVTX_PAYLOAD_TYPE_INT32          = nvtxPayloadType(5), &
  NVTX_PAYLOAD_TYPE_FLOAT          = nvtxPayloadType(6)
! Something just for Fortran ease of use, C compat.
! The Fortran structure is bigger, but the first 48 bytes are the same
! Making it allocatable means it will get deallocated properly
type nvtxFtnStringType
  character(1), allocatable :: chars(:)
end type
! nvtxEventAttributes_v2, from nvToolsExt.h
type, bind(C):: nvtxEventAttributes
  integer(C_INT16_T)      :: version = NVTX_VERSION
  integer(C_INT16_T)      :: size = NVTX_EVENT_ATTRIB_STRUCT_SIZE
  integer(C_INT)          :: category = 0
  type(nvtxColorType)     :: colorType = NVTX_COLOR_ARGB
  integer(C_INT)          :: color = z'ffffffff'
  type(nvtxPayloadType)   :: payloadType = NVTX_PAYLOAD_UNKNOWN
  integer(C_INT)          :: reserved0
  integer(C_INT64_T)      :: payload  ! union uint,int,double
  type(nvtxMessageType)   :: messageType = NVTX_MESSAGE_TYPE_ASCII
  type(nvtxFtnStringType) :: message  ! ascii char
end type
! This module provides a type constructor for the nvtxEventAttributes type.
! For example:
! event = nvtxEventAttributes(message, color)
! message can be a Fortran character string, or
! an nvtx registered string.
! color is an optional argument, integer(C_INT), assigned to
! the color field
type nvtxRangeId
  integer(8) :: id
end type
type nvtxDomainHandle
  type(C_PTR) :: handle
end type
type nvtxStringHandle
  type(C_PTR) :: handle
end type

nvtxInitialize

This subroutine forces the NVTX library to initialize. It can be used to move the initialization overhead for timing puposes. It takes no arguments.

subroutine nvtxInitialize()

nvtxDomainCreate

This function creates a new named NVTX domain. Each domain maintains its own push and pop stack.

function nvtxDomainCreate(message) result(domain)
   character(len=*) :: message
   type(nvtxDomainHandle) :: domain

nvtxDomainDestroy

This subroutine destroys an NVTX domain.

subroutine nvtxDomainDestroy(domain)
   type(nvtxDomainHandle) :: domain

nvtxDomainRegisterString

This function registers an immutable string with NVTX, for use with the type(eventAttributes) message field.

function nvtxDomainRegisterString(domain, message) &
  result(stringHandle)
  type(nvtxDomainHandle) :: domain
  character(len=*) :: message
  type(nvtxStringHandle) :: stringHandle

Using overloaded assignment defined in this module, users can enable a registered string using these two statements:

event%message = nvtxDomainRegisterString(domain, "Str 1")
event%messageType = NVTX_MESSAGE_TYPE_REGISTERED

A type(eventAttributes) variable can also be initialized by passing a registered string to the type constructor, along with an optional color:

regstr = nvtxDomainRegisterString(domain, "Str 2")
  event = nvtxEventAttributes(regstr, icolor)

nvtxDomainNameCategory

This subroutine allows the user to assign a name to a category ID that is specific to the domain.

subroutine nvtxDomainNameCategory(domain, category, name)
  type(nvtxDomainHandle) :: domain
  integer(4) :: category
  character(len=*) :: name

nvtxNameCategory

This subroutine allows the user to assign a name to a category ID.

subroutine nvtxNameCategory(category, name)
  integer(4) :: category
  character(len=*) :: name

nvtxDomainMarkEx

This subroutine marks an instantaneous event in the application, with full control over the NVTX domain and event attributes.

subroutine nvtxDomainMarkEx(domain, event)
  type(nvtxDomainHandle) :: domain
  type(nvtxEventAttributes) :: event

nvtxMarkEx

This subroutine marks an instantaneous event in the application, with user-supplied NVTX event attributes.

subroutine nvtxMarkEx(event)
  type(nvtxEventAttributes) :: event

nvtxMark

This subroutine marks an instantaneous event in the application with a user-supplied message.

subroutine nvtxMark(message)
  character(len=*) :: message

nvtxDomainRangeStartEx

This function starts a process range in the application, with full control over the NVTX domain and event attributes, and returns a unique range ID.

function nvtxDomainRangeStartEx(domain, event) result(id)
  type(nvtxDomainHandle) :: domain
  type(nvtxEventAttributes) :: event
  type(nvtxRangeId) :: id

nvtxRangeStartEx

This function starts a process range in the application, with user-supplied NVTX event attributes, and returns a unique range ID.

function nvtxRangeStartEx(event) result(id)
  type(nvtxEventAttributes) :: event
  type(nvtxRangeId) :: id

nvtxRangeStart

This function starts a process range in the application with a user-supplied message, and returns a unique range ID.

function nvtxRangeStart(message) result(id)
  character(len=*) :: message
  type(nvtxRangeId) :: id

nvtxDomainRangeEnd

This subroutine ends a process range in the application. Arguments are the domain and range ID from a previous call to nvtxDomainRangeStartEx.

subroutine nvtxDomainRangeEnd(domain, id)
  type(nvtxDomainHandle) :: domain
  type(nvtxRangeId) :: id

nvtxRangeEnd

This subroutine ends a process range in the application. The argument is a range ID returned from a previous call to any nvtxRangeStart function.

subroutine nvtxRangeEnd(id)
  type(nvtxRangeId) :: id

nvtxDomainRangePushEx

This function starts a nested thread range in the application, with full control over the NVTX domain and event attributes, and returns nested range level.

function nvtxDomainRangePushEx(domain, event) result(ilvl)
  type(nvtxDomainHandle) :: domain
  type(nvtxEventAttributes) :: event
  integer(4) :: ilvl

nvtxRangePushEx

This function starts a nested thread range in the application, with user-supplied event attributes, and returns the nested range level.

function nvtxRangePushEx(event) result(ilvl)
  type(nvtxEventAttributes) :: event
  integer(4) :: ilvl

nvtxRangePush

This function starts a nested range in the application with a user-supplied message, and returns the level of the range being started.

function nvtxRangePush(message) result(ilvl)
  character(len=*) :: message
  integer(4) :: ilvl

nvtxDomainRangePop

This functions ends a nested thread range in the application, within a specific domain.

function nvtxDomainRangePop(domain) result(ilvl)
  type(nvtxDomainHandle) :: domain
  integer(4) :: ilvl

nvtxRangePop

This functions ends a nested thread range in the application, and returns the level of the range being ended.

function nvtxRangePop() result(ilvl)
  integer(4) :: ilvl

NVTX Automated Instrumentation

This section describes a method to automatically insert NVIDIA Tools Extension (NVTX) ranges into your code without making source changes. This method is only supported on Linux systems.

The first step is to determine which source files you want to view NVTX labels for. In your build process, add this compiler option for those files:

-Minstrument

This standard compiler option instructs the compiler to insert two calls into the generated code: at subprogram entry, it will insert a call to __cyg_profile_func_enter(), and at subprogram exit, it will insert a call to __cyg_profile_func_exit(). These entry points are meant to be supplied by profiling tools. One important input argument to these functions, inserted by the compiler, is the function address.

The next step, for best user experience, is to link your executable with these options:

-traceback  -lnvhpcwrapnvtx

or alternatively:

-fPIC  -Wl,-export-dynamic  -lnvhpcwrapnvtx

These options will enable the runtime to convert the function or subroutine address into a symbol, via the dladdr() system call. Without these options, the label will contain the subprogram unit address, in hexadecimal, which is useful, but does require some other manual processing steps to determine the associated symbol name.

As with all of the NVTX instrumentation methods, you need to enable the processing of the NVTX API calls when you run. An example of enabling NVTX, using Nsight Systems, is to use

nsys profile --trace=nvtx

which will result in the NVTX time span ranges presented on the Nsight timeline. Currently,

--trace=nvtx

is set by default, so just specifying

nsys profile ./a.out

will provide you with the NVTX annotations, along with CUDA traces.