NVTX Profiling Library APIs
This chapter describes the Fortran interfaces to the NVIDIA Tools Extension (NVTX) library. NVTX is a set of functions that a developer can use to provide additional information to tools, such as NVIDIA’s Nsight Systems performance analysis tool. NVTX functions are accessible from host code, but can be useful in marking and viewing time spans (ranges) of both host and device sections of an application.
The NVTX interfaces and definitions described in this chapter can be exposed by adding the line
use nvtx
to your program unit. A version of this module has been available through other means in the past, but this chapter documents the Fortran module now included in the NVIDIA HPC SDK. Since we are targeting the NVTX v3 API, a header-only C library, we have instantiated Fortran-callable wrappers and provide those in a library, libnvhpcwrapnvtx.[a|so]
; linking requires the developer add -cudalib=nvtx
to their link line, or explicitly add some form of -lnvhpcwrapnvtx
.
This chapter is divided into three sections. The first describes the traditional Fortran NVTX interfaces which have been available previously. The second describes advanced functions which are now supported in the NVTX v3 API. The third shows a method which leverages the nvfortran -Minstrument
option to automatically insert NVTX ranges across subprogram entry and exit.
Unless a specific kind is provided, the plain integer type used in the interfaces implies integer(4).
NVTX Basic Tooling APIs
This section describes the most basic Fortran interfaces to the NVIDIA Tools Extension (NVTX) library. These interfaces were first defined in blog posts and via a publicly available source repository. The simplest interfaces merely push and pop user-labeled, nested time ranges.
The StartRange/EndRange names were transposed from the advanced RangeStart/RangeEnd originally for ease-of-use. Both types can be used in the same program.
nvtxStartRange
This subroutine begins a simple labelled time span range using the NVTX library. The icolor
argument is optional, and will map to one of many predefined colors. The ranges can be nested.
subroutine nvtxStartRange( label, icolor )
character(len=*) :: label
integer, optional :: icolor
nvtxEndRange
This subroutine terminates a simple labelled time span range initiated by nvtxStartRange
. It takes no arguments.
subroutine nvtxEndRange()
NVTX Advanced Tooling APIs
This section describes the advanced Fortran interfaces to the NVIDIA Tools Extension (NVTX) library which target the NVTX v3 API.
NVTX Definitions and Derived Types
This section contains the definitions and data types used in the advanced Fortran interfaces to the NVIDIA Tools Extension (NVTX) library, v3 API.
! Parameters
integer, parameter :: NVTX_VERSION = 3
integer, parameter :: NVTX_EVENT_ATTRIB_STRUCT_SIZE = 48
! NVTX Status
enum, bind(C)
enumerator :: NVTX_SUCCESS = 0
enumerator :: NVTX_FAIL = 1
enumerator :: NVTX_ERR_INIT_LOAD_PROPERTY = 2
enumerator :: NVTX_ERR_INIT_ACCESS_LIBRARY = 3
enumerator :: NVTX_ERR_INIT_LOAD_LIBRARY = 4
enumerator :: NVTX_ERR_INIT_MISSING_LIBRARY_ENTRY_POINT = 5
enumerator :: NVTX_ERR_INIT_FAILED_LIBRARY_ENTRY_POINT = 6
enumerator :: NVTX_ERR_NO_INJECTION_LIBRARY_AVAILABLE = 7
end enum
! nvtxColorType_t, from nvToolsExt.h
type, bind(c) :: nvtxColorType
integer(4) :: type
end type
type(nvtxColorType), parameter :: &
NVTX_COLOR_UNKNOWN = nvtxColorType(0), &
NVTX_COLOR_ARGB = nvtxColorType(1)
! nvtxMessageType_t, from nvToolsExt.h
type, bind(c) :: nvtxMessageType
integer(4) :: type
end type
type(nvtxMessageType), parameter :: &
NVTX_MESSAGE_UNKNOWN = nvtxMessageType(0), &
NVTX_MESSAGE_TYPE_ASCII = nvtxMessageType(1), &
NVTX_MESSAGE_TYPE_UNICODE = nvtxMessageType(2), &
NVTX_MESSAGE_TYPE_REGISTERED = nvtxMessageType(3)
! nvtxPayloadType_t, from nvToolsExt.h
type, bind(c) :: nvtxPayloadType
integer(4) :: type
end type
type(nvtxPayloadType), parameter :: &
NVTX_PAYLOAD_UNKNOWN = nvtxPayloadType(0), &
NVTX_PAYLOAD_TYPE_UNSIGNED_INT64 = nvtxPayloadType(1), &
NVTX_PAYLOAD_TYPE_INT64 = nvtxPayloadType(2), &
NVTX_PAYLOAD_TYPE_DOUBLE = nvtxPayloadType(3), &
NVTX_PAYLOAD_TYPE_UNSIGNED_INT32 = nvtxPayloadType(4), &
NVTX_PAYLOAD_TYPE_INT32 = nvtxPayloadType(5), &
NVTX_PAYLOAD_TYPE_FLOAT = nvtxPayloadType(6)
! Something just for Fortran ease of use, C compat.
! The Fortran structure is bigger, but the first 48 bytes are the same
! Making it allocatable means it will get deallocated properly
type nvtxFtnStringType
character(1), allocatable :: chars(:)
end type
! nvtxEventAttributes_v2, from nvToolsExt.h
type, bind(C):: nvtxEventAttributes
integer(C_INT16_T) :: version = NVTX_VERSION
integer(C_INT16_T) :: size = NVTX_EVENT_ATTRIB_STRUCT_SIZE
integer(C_INT) :: category = 0
type(nvtxColorType) :: colorType = NVTX_COLOR_ARGB
integer(C_INT) :: color = z'ffffffff'
type(nvtxPayloadType) :: payloadType = NVTX_PAYLOAD_UNKNOWN
integer(C_INT) :: reserved0
integer(C_INT64_T) :: payload ! union uint,int,double
type(nvtxMessageType) :: messageType = NVTX_MESSAGE_TYPE_ASCII
type(nvtxFtnStringType) :: message ! ascii char
end type
! This module provides a type constructor for the nvtxEventAttributes type.
! For example:
! event = nvtxEventAttributes(message, color)
! message can be a Fortran character string, or
! an nvtx registered string.
! color is an optional argument, integer(C_INT), assigned to
! the color field
type nvtxRangeId
integer(8) :: id
end type
type nvtxDomainHandle
type(C_PTR) :: handle
end type
type nvtxStringHandle
type(C_PTR) :: handle
end type
nvtxInitialize
This subroutine forces the NVTX library to initialize. It can be used to move the initialization overhead for timing puposes. It takes no arguments.
subroutine nvtxInitialize()
nvtxDomainCreate
This function creates a new named NVTX domain. Each domain maintains its own push and pop stack.
function nvtxDomainCreate(message) result(domain)
character(len=*) :: message
type(nvtxDomainHandle) :: domain
nvtxDomainDestroy
This subroutine destroys an NVTX domain.
subroutine nvtxDomainDestroy(domain)
type(nvtxDomainHandle) :: domain
nvtxDomainRegisterString
This function registers an immutable string with NVTX, for use with the type(eventAttributes)
message field.
function nvtxDomainRegisterString(domain, message) &
result(stringHandle)
type(nvtxDomainHandle) :: domain
character(len=*) :: message
type(nvtxStringHandle) :: stringHandle
Using overloaded assignment defined in this module, users can enable a registered string using these two statements:
event%message = nvtxDomainRegisterString(domain, "Str 1")
event%messageType = NVTX_MESSAGE_TYPE_REGISTERED
A type(eventAttributes)
variable can also be initialized by passing a registered string to the type constructor, along with an optional color:
regstr = nvtxDomainRegisterString(domain, "Str 2")
event = nvtxEventAttributes(regstr, icolor)
nvtxDomainNameCategory
This subroutine allows the user to assign a name to a category ID that is specific to the domain.
subroutine nvtxDomainNameCategory(domain, category, name)
type(nvtxDomainHandle) :: domain
integer(4) :: category
character(len=*) :: name
nvtxNameCategory
This subroutine allows the user to assign a name to a category ID.
subroutine nvtxNameCategory(category, name)
integer(4) :: category
character(len=*) :: name
nvtxDomainMarkEx
This subroutine marks an instantaneous event in the application, with full control over the NVTX domain and event attributes.
subroutine nvtxDomainMarkEx(domain, event)
type(nvtxDomainHandle) :: domain
type(nvtxEventAttributes) :: event
nvtxMarkEx
This subroutine marks an instantaneous event in the application, with user-supplied NVTX event attributes.
subroutine nvtxMarkEx(event)
type(nvtxEventAttributes) :: event
nvtxMark
This subroutine marks an instantaneous event in the application with a user-supplied message.
subroutine nvtxMark(message)
character(len=*) :: message
nvtxDomainRangeStartEx
This function starts a process range in the application, with full control over the NVTX domain and event attributes, and returns a unique range ID.
function nvtxDomainRangeStartEx(domain, event) result(id)
type(nvtxDomainHandle) :: domain
type(nvtxEventAttributes) :: event
type(nvtxRangeId) :: id
nvtxRangeStartEx
This function starts a process range in the application, with user-supplied NVTX event attributes, and returns a unique range ID.
function nvtxRangeStartEx(event) result(id)
type(nvtxEventAttributes) :: event
type(nvtxRangeId) :: id
nvtxRangeStart
This function starts a process range in the application with a user-supplied message, and returns a unique range ID.
function nvtxRangeStart(message) result(id)
character(len=*) :: message
type(nvtxRangeId) :: id
nvtxDomainRangeEnd
This subroutine ends a process range in the application. Arguments are the domain and range ID from a previous call to nvtxDomainRangeStartEx
.
subroutine nvtxDomainRangeEnd(domain, id)
type(nvtxDomainHandle) :: domain
type(nvtxRangeId) :: id
nvtxRangeEnd
This subroutine ends a process range in the application. The argument is a range ID returned from a previous call to any nvtxRangeStart function.
subroutine nvtxRangeEnd(id)
type(nvtxRangeId) :: id
nvtxDomainRangePushEx
This function starts a nested thread range in the application, with full control over the NVTX domain and event attributes, and returns nested range level.
function nvtxDomainRangePushEx(domain, event) result(ilvl)
type(nvtxDomainHandle) :: domain
type(nvtxEventAttributes) :: event
integer(4) :: ilvl
nvtxRangePushEx
This function starts a nested thread range in the application, with user-supplied event attributes, and returns the nested range level.
function nvtxRangePushEx(event) result(ilvl)
type(nvtxEventAttributes) :: event
integer(4) :: ilvl
nvtxRangePush
This function starts a nested range in the application with a user-supplied message, and returns the level of the range being started.
function nvtxRangePush(message) result(ilvl)
character(len=*) :: message
integer(4) :: ilvl
nvtxDomainRangePop
This functions ends a nested thread range in the application, within a specific domain.
function nvtxDomainRangePop(domain) result(ilvl)
type(nvtxDomainHandle) :: domain
integer(4) :: ilvl
nvtxRangePop
This functions ends a nested thread range in the application, and returns the level of the range being ended.
function nvtxRangePop() result(ilvl)
integer(4) :: ilvl
NVTX Automated Instrumentation
This section describes a method to automatically insert NVIDIA Tools Extension (NVTX) ranges into your code without making source changes. This method is only supported on Linux systems.
The first step is to determine which source files you want to view NVTX labels for. In your build process, add this compiler option for those files:
-Minstrument
This standard compiler option instructs the compiler to insert two calls into the generated code: at subprogram entry, it will insert a call to __cyg_profile_func_enter()
, and at subprogram exit, it will insert a call to __cyg_profile_func_exit()
. These entry points are meant to be supplied by profiling tools. One important input argument to these functions, inserted by the compiler, is the function address.
The next step, for best user experience, is to link your executable with these options:
-traceback -lnvhpcwrapnvtx
or alternatively:
-fPIC -Wl,-export-dynamic -lnvhpcwrapnvtx
These options will enable the runtime to convert the function or subroutine address into a symbol, via the dladdr() system call. Without these options, the label will contain the subprogram unit address, in hexadecimal, which is useful, but does require some other manual processing steps to determine the associated symbol name.
As with all of the NVTX instrumentation methods, you need to enable the processing of the NVTX API calls when you run. An example of enabling NVTX, using Nsight Systems, is to use
nsys profile --trace=nvtx
which will result in the NVTX time span ranges presented on the Nsight timeline. Currently,
--trace=nvtx
is set by default, so just specifying
nsys profile ./a.out
will provide you with the NVTX annotations, along with CUDA traces.