NVIDIA® Nsight™ Development Platform, Visual Studio Edition 4.7 User Guide
Send Feedback
The Disassembly Regex experiment is a framework for creating custom experiments to count classes of instructions. The experiment is configured by a script, where regular expressions are specified to select which assembly instructions to count, and weighted sums of those counts are given names. The results are presented in charts whose properties are controllable from the script. The activity page's script editor provides interactive guidance to ensure the script is well-formed. This experiment provides a convenient way to capture metrics whose exact definitions can vary, like floating point operations per second (FLOPS).
Counting the execution frequency of a specific class of instructions is a common way to characterize performance. Generic metrics like IPC (instructions per cycle) and FLOPS are popular because they are conceptually simple and usable on all architectures, but even metrics as simple as these may be difficult to define precisely. For example, when counting floating point operations, a fused-multiply-add may count as one operation or two, and a precision conversion may count as zero, one, or a fractional value. The ability to quickly redefine which instructions are counted and how they are weighted allows for rapid experimentation with varying definitions of a metric. This experiment uses an easy-to-learn scripting language to achieve a balance between convenience and flexibility. The language defines the following types of objects and their relationships:
A named object representing an ECMAscript standard regular expression. The experiment disassembles the kernel into SASS, producing the string values shown in the Source View page. The regex is applied to each disassembled instruction, and instructions matching the regex are patched to increment a count each time the instruction is executed. Extra options control how the regex is applied. Search/Match choose whether to search the line for a match, or require the regex must match the entire line. Predicate/Opcode/Instruction/Args/Comment choose a specific part of the line to match (note that Instruction is the full instruction mnemonic as shown in the Source View, while Opcode is the part of the instruction preceding the first period). Counting can be done per-thread or per-warp. For example, an instruction in a loop that executes 100 times will add 100 to the count in per-warp mode, or 3200 in per-thread mode with all threads active.
A named object representing a weighted sum of [Regex] counts. Regex objects are specified by name, one per line, preceded by weights. The weights can be any positive or negative integer or floating point value. Regex objects can be used by multiple Counter objects.
A named object representing a list of [Counter] values which should be displayed in a chart on the report page. Counter objects are specified by name, one per line. Counter objects can be used by multiple Group objects. The chart type is an option for the Group. It can be a bar chart (counters are each a single bar), a stacked bar chart (each bar is a multi-segment stack of counters that specify a common stack name), or a pie chart (each counter is a piece of the pie). Whether to show the counts or the rates (counts divided by time) is an option for the Group. See the Achieved FLOPS experiment for an example of how the charts appear.
Each object is specified on a line in square brackets, with the type followed by the name. The name can include spaces and any other characters except square brackets. Options for the object can be appended to the line. For Regex objects, the line immediately following the definition line is the regex string, and all characters are allowed (escaping works exactly as expected). For Counter and Group objects, the lines following the definition line up to the next definition line specify the members (Counters specify a list of Regexes, Groups specify a list of Counters). Except on lines defining regex strings (where all characters are treated as part of the regex), the script format ignores whitespace and lines beginning with # (for writing comments). The script editor in the activity page interprets on-the-fly as changes are made, explaining errors if there are any. The activity will not allow launching if the script is invalid.
After typing [foo] on line 3, the script editor explains 'foo' is an unknown option for a Regex object, and shows the valid options.
The text that appears in a new Disassembly Regex experiment is a well-commented example to demonstrate the syntax. More sophisticated examples are provided in the Achieved FLOPS and Achieved IOPS experiments, which are both based on the Disassembly Regex experiment. The contents of their scripts can be copied, pasted into a new Disassembly Regex experiment, and modified.
NVIDIA GameWorks Documentation Rev. 1.0.150630 ©2015. NVIDIA Corporation. All Rights Reserved.