NVIDIA DOCA Ngague
NOTE THAT THIS CONFLUENCE PAGE IS NOT READY YET!
Contents:
This document provides instructions on the usage of the ngague tool.
ngauge is a tool for probing NIC HW counters, and storing them in an HDF5 format , together with the relevant metadata, for later processing. In addition, the progress and measurement summary are displayed graphically, on a CLI.
Supported hardware are BlueField-3, ConnectX-7, and above.
NVIDIA® BlueField®-3, ConnectX®-7, and above with firmware version xx.43.1000 or higher, and fwctl driver.
To install the fwctl driver (for host only, for DPU it's already installed), search for a package with "fwctl" and install the package you find.
On deb-based distros, use
apt-cache search fwctl
and for RPM-based distros use
dnf search fwctl
NOTE: On Ubuntu 20.04 the fwctl
driver is not loaded automatically, and one needs to modprobe mlx5_fwctl
after every reboot.
All the configurations are done in the input YAML file.
Start by copying a sample configuration from
/usr/share/doc/ngauge/examples/settings.
The device to run on should be configured as the PCI address (e.g. 0000:03:00.0):
device: "0000:03:00.0"
The data output path is configured like so (path and prefix to the output file - both are mandatory):
output:
path: /path/to/output/directory
prefix: "ngauge_data_"
In the example above, the output will be saved like so: /path/to/output/directory/ngauge_data_<DATE>_<TIME>.h5
. The explicit output name will be printed after each run.
Run parameters (the most useful of them is the sampling period!) are configured like so:
params:
mode: repetitive # [repetitive, single]
period_us: 1e2
In the example above, "1e2" means 100 μs. Numbers in decimal or scientific notation are accepted.
The counters to measure are configured like so. The only mandatory configuration for a counter is the Data ID. All the other configurations are optional.
counters:
- id: 0x1020000100000000
desc: RX bytes port 0
unit: RX port
accumulating: false
normalizer: time # Normalizer, if
present, must be either 'time'
or a number.
You can find all supported performance counters in this link: Supported Data IDs
You may want to install doca-telemetry-utils - a tool which can generate counter IDs to be used to configure ngauge. Do it like so: sudo apt-get install doca-telemetry-utils
or sudo dnf install doca-telemetry-utils
.
Then run doca_telemetry_utils -h
for help, and doca_telemetry_utils get-counters
to get the list of available counters.
Parsing output
A sample plugin, named simple-plot, will be installed in /usr/share/doc/ngauge/examples/plugins
.
This plugin is a basic demonstration of how you can open the output HDF5 file with the data in it and plot it. Besides plotting, many types of analyses can be done on these data. The sample plugin is just a rudimentary demonstration.
Usage: /usr/share/doc/ngauge/examples/plugins/simple_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]
If you only want to plot the results of the very last run, and your output directory is /tmp
(the default) then you can use this expression to always represent the last results, instead of copy-pasting the file name every time): "$(ls -1 /tmp/ngauge_data_*.h5 | tail -n1)"
.
The sample plugins are just examples, and should not be considered integral parts of the ngauge tool. therefore you may need to install the dependencies to tun them, such as NumPy, H5py, Matplotlib, plotext, and others separately.
To run ngague:
Usage: ngague <configuration YAML file>
The output will be saved in an HDF5 file (.h5) in the path that you have specified in the configuration YAML.
During the run you will see progress bars for each counter, just as in the image below. The colors symbolize the following:
Blue - low values (relatively to the other values of the same counter).
Red - high values (relatively to the other values of the same counter).
Any color between blue and red - intermediate values.
Solid gray bars mean that the values of this counter did not change at all during the run.
