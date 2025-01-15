DOCA Documentation v2.5.3 (2023 LTS U3)
NVIDIA DOCA Ngague

This document provides instructions on the usage of the ngague tool.

Introduction

ngauge is a tool for probing NIC HW counters, and storing them in an HDF5 format , together with the relevant metadata, for later processing. In addition, the progress and measurement summary are displayed graphically, on a CLI.

Supported hardware are BlueField-3, ConnectX-7, and above.

Prerequisites

NVIDIA® BlueField®-3, ConnectX®-7, and above with firmware version xx.43.1000 or higher, and fwctl driver.

Info

To install the fwctl driver (for host only, for DPU it's already installed), search for a package with "fwctl" and install the package you find.

On deb-based distros, use apt-cache search fwctl and for RPM-based distros use dnf search fwctl

NOTE: On Ubuntu 20.04 the fwctl driver is not loaded automatically, and one needs to modprobe mlx5_fwctl after every reboot.

Description

All the configurations are done in the input YAML file.

Start by copying a sample configuration from /usr/share/doc/ngauge/examples/settings.

The device to run on should be configured as the PCI address (e.g. 0000:03:00.0):

device: "0000:03:00.0"

The data output path is configured like so (path and prefix to the output file - both are mandatory):

output:
  path: /path/to/output/directory
  prefix: "ngauge_data_"

In the example above, the output will be saved like so: /path/to/output/directory/ngauge_data_<DATE>_<TIME>.h5. The explicit output name will be printed after each run.

Run parameters (the most useful of them is the sampling period!) are configured like so:

params:
  mode: repetitive  # [repetitive, single]
  period_us: 1e2

In the example above, "1e2" means 100 μs. Numbers in decimal or scientific notation are accepted.

The counters to measure are configured like so. The only mandatory configuration for a counter is the Data ID. All the other configurations are optional.

counters:
  - id: 0x1020000100000000
    desc: RX bytes port 0
    unit: RX port
    accumulating: false
    normalizer: time  # Normalizer, if present, must be either 'time' or a number.

You can find all supported performance counters in this link: Supported Data IDs

Tip

You may want to install doca-telemetry-utils - a tool which can generate counter IDs to be used to configure ngauge. Do it like so: sudo apt-get install doca-telemetry-utils or sudo dnf install doca-telemetry-utils.

Then run doca_telemetry_utils -h for help, and doca_telemetry_utils get-counters to get the list of available counters.

Parsing output

A sample plugin, named simple-plot, will be installed in /usr/share/doc/ngauge/examples/plugins.

This plugin is a basic demonstration of how you can open the output HDF5 file with the data in it and plot it. Besides plotting, many types of analyses can be done on these data. The sample plugin is just a rudimentary demonstration.

Usage: /usr/share/doc/ngauge/examples/plugins/simple_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]

Tip

If you only want to plot the results of the very last run, and your output directory is /tmp (the default) then you can use this expression to always represent the last results, instead of copy-pasting the file name every time): "$(ls -1 /tmp/ngauge_data_*.h5 | tail -n1)" .

Info

The sample plugins are just examples, and should not be considered integral parts of the ngauge tool. therefore you may need to install the dependencies to tun them, such as NumPy, H5py, Matplotlib, plotext, and others separately.

Execution

To run ngague:

Usage: ngague <configuration YAML file>

The output will be saved in an HDF5 file (.h5) in the path that you have specified in the configuration YAML.

During the run you will see progress bars for each counter, just as in the image below. The colors symbolize the following:

  • Blue - low values (relatively to the other values of the same counter).

  • Red - high values (relatively to the other values of the same counter).

  • Any color between blue and red - intermediate values.

  • Solid gray bars mean that the values of this counter did not change at all during the run.

