2. Python Report Interface
2.1. Introduction
NVIDIA Nsight Compute features a Python-based interface to interact with exported report files.
The module is called ncu_report
and works on any Python version from 3.4
1. It can be found in the extras/python
directory of your NVIDIA
Nsight Compute package.
In order to use the Python module, you need a report file generated by NVIDIA
Nsight Compute. You can obtain such a file by saving it from the graphical
interface or by using the --export
flag of the command line tool.
The types and functions in the ncu_report
module are a subset of the ones
available in the NvRules API. The documentation in this section serves as a
tutorial. For a more formal description of the exposed API, please refer to the
NvRules API documentation.
- 1
On Linux machines you will also need a GNU-compatible libc and
libgcc_s.so
.
2.2. Basic Usage
In order to be able to import ncu_report
you will either have to navigate
to the extras/python
directory, or add its absolute path to the
PYTHONPATH
environment variable. Then, the module can be imported like any
Python module:
>>> import ncu_report
Importing a report
Once the module is imported, you can load a report file by calling the
load_report
function with the path to the file. This function returns an
object of type IContext
which holds all the information concerning that
report.
>>> my_context = ncu_report.load_report("my_report.ncu-rep")
Querying ranges
When working with the Python module, profiling results are grouped into ranges
which are represented by IRange
objects. You can inspect the number of
ranges contained in the loaded report by calling the
IContext.num_ranges
member function of an IContext
object and
retrieve a range by its index using IContext.range_by_idx
.
>>> my_context.num_ranges()
1
>>> my_range = my_context.range_by_idx(0)
Querying actions
Inside a range, profiling results are called actions. You can query the
number of actions contained in a given range by using the
IRange.num_actions
method of an IRange
object.
>>> my_range.num_actions()
2
In the same way ranges can be obtained from an IContext
object by
using the IContext.range_by_idx
method, individual
actions can be obtained from IRange
objects by using the
IRange.action_by_idx
method. The resulting actions are represented by
the IAction
class.
>>> my_action = my_range.action_by_idx(0)
As mentioned previously, an action represents a single profiling result. To
query the workloads’s name you can use the IAction.name
member function
of the IAction
class.
>>> my_action.name()
MyKernel
To query the workload type, you can use the IAction.workload_type
member
function of the IAction
class, which will return an integer among the
IAction.WorkloadType_*
values.
For CUDA kernels, you get the following result:
>>> my_action.workload_type()
0
All return values of IAction.workload_type
are documented in API
documentation for the IAction
class.
Querying metrics
To get a tuple of all metric names contained within an action you can use the
IAction.metric_names
method. It is meant to be combined with the
IAction.metric_by_name
method which returns an IMetric
object.
However, for the same task you may also use the subscript operator through the
IAction.__getitem__
method, as explained in the High-Level Interface section below.
The metric names displayed here are the same as the ones you can use with the
--metrics
flag of NVIDIA Nsight Compute. Once you have extracted a metric
from an action, you can obtain its value by using one of the following three
methods:
IMetric.as_string
to obtain its value as a Pythonstr
IMetric.as_uint64
to obtain its value as a Pythonint
IMetric.as_double
to obtain its value as a Pythonfloat
For example, to print the display name of the GPU on which the workload was
profiled you can query the device__attribute_display_name
metric.
>>> display_name_metric = my_action.metric_by_name('device__attribute_display_name')
>>> display_name_metric.as_string()
'NVIDIA GeForce RTX 3060 Ti'
Note that accessing a metric with the wrong type can lead to unexpected (conversion) results.
>>> display_name_metric.as_double()
0.0
Therefore, it is advisable to directly use the High-Level function
IMetric.value
, as explained below.
2.3. Python Interoperability
On top of the low-level NvRules API the Python Report Interface also implements part of the Python object model. By implementing special methods, the Python Report Interface’s exposed classes can be used with built-in Python mechanisms such as iteration, string formatting and length querying.
This allows you to access metrics objects via the IAction.__getitem__
method of the IAction
class:
>>> display_name_metric = my_action["device__attribute_display_name"]
There is also a convenience method IMetric.value
which allows you to
query the value of a metric object without knowledge of its type:
>>> display_name_metric.value()
'NVIDIA GeForce RTX 3060 Ti'
All the available methods of a class, as well as their associated Python docstrings, can be looked up interactively via
>>> help(ncu_report.IMetric)
or similarly for other classes and methods. In your code, you can access the
docstrings via the __doc__
attribute, i.e.
ncu_report.IMetric.value.__doc__
.
2.4. Metric attributes
Apart from the possibility to query the IMetric.name
and
IMetric.value
of an IMetric
object, you can also query the
following additional metric attributes:
The first method IMetric.metric_type
returns one out of three enum
values (IMetric.MetricType_COUNTER
, IMetric.MetricType_RATIO
,
IMetric.MetricType_THROUGHPUT
) if the metric is a hardware metric, or
IMetric.MetricType_OTHER
otherwise (e.g. for launch or device
attributes).
The method IMetric.metric_subtype
returns an enum value representing
the subtype of a metric (e.g. IMetric.MetricSubtype_PEAK_SUSTAINED
,
IMetric.MetricSubtype_PER_CYCLE_ACTIVE
). In case a metric does not have
a subtype, None
is returned. All available values may be found in the
documentation for the IMetric
class, or may be looked up interactively
by executing help(ncu_report.IMetric)
.
IMetric.rollup_operation
returns the operation which is used to
accumulate different values of the same metric and can be one of
IMetric.RollupOperation_AVG
, IMetric.RollupOperation_MAX
,
IMetric.RollupOperation_MIN
or IMetric.RollupOperation_SUM
for
averaging, maximum, minimum or summation, respectively. If the metric in
question does not specify a rollup operation None
will be returned.
Lastly, IMetric.unit
and IMetric.description
return a (possibly
empty) string of the metric’s unit and a short textual description for
hardware metrics, respectively.
The above methods can be combined to filter through all metrics of a report, given certain criteria:
for metric in metrics:
if metric.metric_type() == IMetric.MetricType_COUNTER and \
metric.metric_subtype() == IMetric.MetricSubtype_PER_SECOND and \
metric.rollup_operation() == IMetric.RollupOperation_AVG:
print(f"{metric.name()}: {metric.value()} {metric.unit()}")
2.5. NVTX Support
The ncu_report
has support for the NVIDIA Tools Extension (NVTX). This
comes through the INvtxState
object which represents the NVTX state of
a profiled kernel.
An INvtxState
object can be obtained from an action by using its
IAction.nvtx_state
method. It exposes the INvtxState.domains
method which returns a tuple of integers representing the domains this kernel
has state in. These integers can be used with the
INvtxState.domain_by_id
method to get an INvtxDomainInfo
object
which represents the state of a domain.
The INvtxDomainInfo
can be used to obtain a tuple of Push-Pop, or
Start-End ranges using the INvtxDomainInfo.push_pop_ranges
and
INvtxDomainInfo.start_end_ranges
methods.
There is also a IRange.actions_by_nvtx
member function in the
IRange
class which allows you to get a tuple of actions matching the
NVTX state described in its parameter.
The parameters for the IRange.actions_by_nvtx
function are two lists of
strings representing the state for which we want to query the actions. The first
parameter describes the NVTX states to include while the second one describes
the NVTX states to exclude. These strings are in the same format as the ones
used with the --nvtx-include
and --nvtx-exclude
options.
2.6. Sample Script
NVTX Push-Pop range filtering
This is a sample script which loads a report and prints the names of all the
profiled kernels which were wrapped inside BottomRange
and TopRange
Push-Pop ranges of the default NVTX domain.
#!/usr/bin/env python3
import sys
import ncu_report
if len(sys.argv) != 2:
print("usage: {} report_file".format(sys.argv[0]), file=sys.stderr)
sys.exit(1)
report = ncu_report.load_report(sys.argv[1])
for range_idx in range(report.num_ranges()):
current_range = report.range_by_idx(range_idx)
for action_idx in current_range.actions_by_nvtx(["BottomRange/*/TopRange"], []):
action = current_range.action_by_idx(action_idx)
print(action.name())
2.7. API Reference
This documents the content of the ncu_report
package which can be found
in the extras/python
directory of your NVIDIA Nsight Compute installation.
- class ncu_report.FocusSeverity
Enum representing the severity of a focus metric.
- DEFAULT
Default severity.
- LOW
Low severity.
- HIGH
High severity.
- class ncu_report.IAction
The
IAction
represents a profile result such as a CUDA kernel in a single range or a range itself in range-based profiling, for which zero or more metrics were collected.- __iter__()
- __len__()
- name(*args)
Get the name of the result the
IAction
object represents.- Parameters
name_base (
int
, optional) – The desired name base. Defaults toNameBase_FUNCTION
.- Returns
The name of the result (potentially in a specific name base).
- Return type
- nvtx_state()
Get the NVTX state associated with this action.
- Returns
The associated
INvtxState
orNone
if no state is associated.- Return type
- ptx_by_pc(address)
Get the PTX associated with an address.
- rule_results()
Get the tuple of all rules associated with this action.
- Returns
A tuple of rule results.
- Return type
tuple
ofIRuleResult
- rule_results_as_dicts()
Get the list of all rules data associated with this action.
- Returns
- A list of rule data dictionaries. Each rule data dictionary contains the following key-value pairs:
’rule_identifier’ : (
str
) The rule identifier.’name’ : (
str
) The rule name.’section_identifier’ : (
str
) The section identifier.- ’focus_metrics’(
list
ofdict
)A list of focus metrics. Each focus metric dictionary contains the following key-value pairs: ’name’ : (
str
) The name of the focus metric.’value’ : (
float
) The value of the focus metric.’severity’ : (
FocusSeverity
) The severity of the focus metric.’info’ : (
str
) The information about the focus metric.
- ’focus_metrics’(
- ’speedup_estimation’(
dict
) with the following key-value pairs: ’type’ : (
SpeedupType
) The speedup type.’speedup’ : (
float
) The speedup value.
- ’speedup_estimation’(
- ’result_tables’(
dict
) with the following key-value pairs: ’title’ : (
str
) The title of the result table.’description’ : (
str
) The description of the result table.’headers’ : (
list
ofstr
) The column headers of the result table.’data’ : (
list
oflist
[int
|float
|str
|Any
]) The table data in row-major format. Each column have elements of the same type.
str
values support below link formats:@url:<hypertext>:<external link>@ - To add a external link for a hypertext.
@sass:<address>:<hypertext>@ - To add a link to the hypertext to open the SASS address line on the Source page.
@source:<file name>:<line number>:<hypertext>@ - To add a link to the hypertext to open the source file at the specified line number on the Source page.
@section:<section identifier>:<hypertext>@ - To add a link to the hypertext to jump to the respective section.
- ’result_tables’(
- Return type
- sass_by_pc(address)
Get the SASS associated with an address.
- source_files()
Get the source files associated with this action along with their content.
If content is not available for a file (e.g. because it hadn’t been imported into the report), the file name will map to an empty string.
- source_info(address)
Get the source info for a function address within this action.
Addresses are commonly obtained as correlation IDs of source-correlated metrics.
- Parameters
address (
int
) – The address to get source info for.- Returns
The
ISourceInfo
associated to the given address. If no source info is available,None
is returned.- Return type
- source_markers()
Get all the source markers associated with this action.
- Returns
- A tuple of source data dictionaries. Each source data
dict
contains the following key-value pairs: ’rule_identifier’ : (
str
) The rule identifier.’section_identifier’ : (
str
) The section identifier.’kind’ : (
int
)MarkerKind
attribute.’message’ : (
str
) The source marker message.’source_address’ : (
int
) Source instruction address. Key available only for kindMarkerKind.SASS
.
- A tuple of source data dictionaries. Each source data
- Return type
- class ncu_report.IContext
The
IContext
class is the top-level object representing an open report.It can be created by calling the
load_report
function.- __getitem__(key)
Get one or more
IRange
objects by index or by slice.Returns:
IRange
|tuple
ofIRange
: AnIRange
object or atuple
ofIRange
objects.- Raises
IndexError – If
key
is out of range for theIContext
.
- __iter__()
- __len__()
- num_ranges()
- class ncu_report.IMetric
Represents a single, named metric. An
IMetric
can carry one value or multiple ones if it is an instanced metric.- MetricSubtype_PEAK_SUSTAINED_ACTIVE_PER_SECOND
Metric subtype for peak sustained active per-second metrics.
- Type
- MetricSubtype_PEAK_SUSTAINED_ELAPSED_PER_SECOND
Metric subtype for peak sustained elapsed per-second metrics.
- Type
- MetricSubtype_PCT_OF_PEAK_SUSTAINED_ACTIVE
Metric subtype for percentage of peak sustained active metrics.
- Type
- MetricSubtype_PCT_OF_PEAK_SUSTAINED_ELAPSED
Metric subtype for percentage of peak sustained elapsed metrics.
- Type
- correlation_ids()
Get a metric object for this metric’s instance value’s correlation IDs.
Returns a new
IMetric
representing the correlation IDs for the metric’s instance values. UseIMetric.has_correlation_ids
to check if this metric has correlation IDs for its instance values. Correlation IDs are used to associate instance values with the instance their value represents. In the returned new metric object, the correlation IDs are that object’s instance values.If the metric does not have any correlation IDs, this function will return
None
.
- has_correlation_ids()
Check if the metric has correlation IDs.
- has_value(*args)
Check if the metric or metric instance has a value.
- kind(*args)
Get the metric or metric instance value kind.
- num_instances()
Get the number of instance values for this metric.
Not all metrics have instance values. If a metric has instance values, it may also have
IMetric.correlation_ids
matching these instance values.- Returns
The number of instances for this metric.
- Return type
- rollup_operation()
Get the type of rollup operation for this metric.
- Returns
The rollup operation type.
- Return type
- class ncu_report.INvtxDomainInfo
Represents a single NVTX domain of the NVTX state, including all ranges associated with this domain.
- __str__()
Get a human-readable representation of this
INvtxDomainInfo
.- Returns
The name of the
INvtxDomainInfo
.- Return type
- name()
Get a human-readable representation of this
INvtxDomainInfo
.- Returns
The name of the
INvtxDomainInfo
.- Return type
- push_pop_range(idx)
Get a push/pop range object by index.
The index is identical to the range’s order on the call stack.
- Returns
The requested
INvtxRange
orNone
if the index is out of range.- Return type
- push_pop_ranges()
Get a sorted list of push/pop range names.
Get the sorted list of stacked push/pop range names in this domain, associated with the current
INvtxState
.
- start_end_range(idx)
Get a start/end range object by index.
- Returns
The requested
INvtxRange
orNone
if the index is out of range.- Return type
- start_end_ranges()
Get a sorted list of start/end range names.
Get the sorted list of start/end range names in this domain, associated with the current
INvtxState
.
- class ncu_report.INvtxRange
Represents a single NVTX Push/Pop or Start/End range.
- category()
Get the category attribute value.
- Returns
The category attribute value. If
INvtxRange.has_attributes
returnsFalse
, this will return0
.- Return type
- color()
Get the color attribute value.
- Returns
The color attribute value. If
INvtxRange.has_attributes
returnsFalse
, this will return0
.- Return type
- has_attributes()
Check if range has event attributes.
- message()
Get the message attribute value.
- Returns
The message attribute value. If
INvtxRange.has_attributes
returnsFalse
, this will return the empty string.- Return type
- class ncu_report.INvtxState
Represents the NVTX (Nvidia Tools Extensions) state associated with a single
IAction
.- __getitem__(key)
Get an
INvtxDomainInfo
object by ID.- Parameters
key (
int
) – The ID of theINvtxDomainInfo
object.- Returns
An
INvtxDomainInfo
object.- Return type
- Raises
- __iter__()
Get an iterator over the
INvtxDomainInfo
objects of thisINvtxState
.- Returns
An iterator over the
INvtxDomainInfo
objects.- Return type
- __len__()
Get the number of
INvtxDomainInfo
objects of thisINvtxState
.- Returns
The number of
INvtxDomainInfo
objects.- Return type
- domain_by_id(id)
Get a
INvtxDomainInfo
object by ID.Use
INvtxState.domains
to retrieve the list of valid domain IDs.- Parameters
id (
int
) – The ID of the request domain.- Returns
The requested
INvtxDomainInfo
object.- Return type
- class ncu_report.IRange
Represents a serial, ordered stream of execution, such as a CUDA stream. It holds one or more actions that were logically executing in this range.
- __getitem__(key)
- __len__()
Get the number of
IAction
objects in thisIRange
.- Returns
The number of class:IAction objects.
- Return type
- actions_by_nvtx(includes, excludes)
Get a set of indices to IAction objects by their NVTX state. The state is defined using a series of includes and excludes.
- class ncu_report.IRuleResult
The
IRuleResult
represents rule results.- __str__()
Get a human-readable name of the rule result.
- Returns
The name of the
IRuleResult
object represents.- Return type
- focus_metrics()
Get all the focus metrics details.
- Returns
- A list of focus metrics details. Each focus metric dictionary contains the following key-value pairs:
’name’ : (
str
) The name of the focus metric.’value’ : (
float
) The value of the focus metric.’severity’ : (
FocusSeverity
) The severity of the focus metric.’info’ : (
str
) The information about the focus metric.
- Return type
- has_rule_message()
Check if the rule has a message.
- has_speedup_estimation()
Check if the rule has speedup estimation.
- result_tables()
Get all the result tables.
- Returns
- A list of result tables. Each result table dictionary contains the following key-value pairs:
’title’ : (
str
) The title of the result table.’description’ : (
str
) The description of the result table.’headers’ : (
list
ofstr
) The column headers of the result table.’data’ : (
list
oflist
[int
|float
|str
|Any
]) The table data in row-major format. Each column have elements of the same type.- class:str values may contain substrings with the following special link formats
@url:<hypertext>:<external link>@ - To add a external link for a hypertext.
@sass:<address>:<hypertext>@ - To add a link to the hypertext to open the SASS address line on the Source page.
@source:<file name>:<line number>:<hypertext>@ - To add a link to the hypertext to open the source file at the specified line number on the Source page.
@section:<section identifier>:<hypertext>@ - To add a link to the hypertext to jump to the respective section.
- Return type
- rule_message()
Get the rule message.
- Returns
- A dictionary with the following key-value pairs:
’title’ : (
str
) The rule message title.’message’ : (
str
) The rule message.- The message may contain substrings with the following special link formats:
@url:<hypertext>:<external link>@ - To add a external link for a hypertext.
@sass:<address>:<hypertext>@ - To add a link to the hypertext to open the SASS address line on the Source page.
@source:<file name>:<line number>:<hypertext>@ - To add a link to the hypertext to open the source file at the specified line number on the Source page.
@section:<section identifier>:<hypertext>@ - To add a link to the hypertext to jump to the respective section.
’type’ : (
MsgType
) The message type.
- Return type
- speedup_estimation()
Get the speedup estimation.
- Returns
- A dictionary with the following key-value pairs
’type’ : (
SpeedupType
) The speedup type.’speedup’ : (
float
) The estimated speedup.
- Return type
- class ncu_report.ISourceInfo
Represents the source correlation information for a specific function address within an action.
- file_name()
Get the file name, as embedded in the correlation information.
- Returns
The file name.
- Return type
- class ncu_report.MarkerKind
Enum representing the kind of a source marker.
- SASS
The marker will be associated with a SASS instruction.
- SOURCE
The marker will be associated with a Source line.
- NONE
No specific kind of marker.
- class ncu_report.MsgType
Enum representing the type of the message.
- NONE
No specific type for this message.
- OK
The message is informative.
- OPTIMIZATION
The message represents a suggestion for performance optimization.
- WARNING
The message represents a warning or fixable issue.
- ERROR
The message represents an error, potentially in executing the rule.
- class ncu_report.SpeedupType
Enum representing the type of speedup estimation.
- UNKNOWN
Unknown speedup type.
- LOCAL
Value represents increase in hardware efficiency in isolated context.
- GLOBAL
Value represents decrease in overall kernel runtime.
- ncu_report.load_report(file_name)
Load an NVIDIA Nsight Compute report file into an
IContext
object.- Parameters
file_name (
str
|pathlib.Path
) – The relative or absolute path to the.ncu-rep
report file.- Returns
An
IContext
object representing the loaded report file.- Return type
- Raises
FileNotFoundError – Either if
file_name
does not exist or if the NVIDIA Nsight Compute library directory cannot be found.
Notices
Notices
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.