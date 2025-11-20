DOCA Ngauge
This document provides instructions on the usage of the
ngauge tool.
ngauge tool is used to analyze, visualize, and debug network performance on a single node. The tool is designed for probing NIC hardware counters and storing the collected data in HDF5 format, along with relevant metadata, for subsequent processing. Additionally, the tool provides graphical progress updates and a measurement summary directly on the CLI, offering real-time insights into the measurement process.
Supported hardware: NVIDIA® BlueField®-3, NVIDIA® ConnectX®-7, and above.
ngauge relies on the
fwctl driver and, therefore, cannot be run simultaneously with other tools or services that also utilize this driver.
BlueField-3 or ConnectX-7 and above with firmware version xx.43.1000 or higher
fwctldriver installed on the host:Note
On hosts running Linux kernel version 6.15 and above, there is no need to install
fwctlmanually as it is available by default, unless it was explicitly disabled during kernel configuration.
OS
Commands
Deb-based 1
Search for the package:
apt-cache search fwctl
Install the package:
sudoapt
install<package-name>
RPM-based
Search for the package:
dnf search fwctl
Install the package:
sudodnf
install<package-name>
On Ubuntu 20.04, the
fwctldriver is not loaded automatically upon system startup. To load it, run the command
modprobe mlx5_fwctlafter every reboot. ⤶
Installing Ngauge
Install
ngauge by running
sudo apt-get install ngauge or
sudo dnf install ngauge (on x86 or Arm 64 hosts).
On the DPU, the
ngauge package is pre-installed, so the above step is not necessary.
All the configurations for
ngauge are defined in an input YAML file.
Copy a sample configuration file from
/usr/share/doc/ngauge/examples/settings.Info
Use the one which best fits your scenario single/dual port, etc.
Specify the device to run on using its PCIe address. For example:
device:
"0000:03:00.0"
Configure the output path and file prefix (both are mandatory):
output: path: /path/to/output/directory prefix:
"ngauge_data_"silent:
false
The output file is saved in the format
/path/to/output/directory/<prefix>_<DATE>_<TIME>.h5.
The exact file name is printed after each run.
If the
silentoption is set to
true, progress indications on the command line are suppressed (default:
false).
Configure parameters for the application's runtime behavior:
params: mode: repetitive # [repetitive, single] period_us: 1e2 # Sampling period in microseconds (e.g.,
"1e2"=
100μs)Info
Numbers in decimal or scientific notation are accepted. In the example,
1e2means 100 μs.
Define the counters to measure. The
id(data ID) is the only mandatory field. Additional fields are optional:
counters: - id:
0x1020000100000000# Data ID (mandatory) desc: RX bytes port
0# Description (optional) unit: RX port # Unit type (optional) accumulating:
false# Whether the counter accumulates values (optional) normalizer: time # Normalizer,
ifpresent, must be either
'time', or a non-zero number, or one of the configured Data IDs with an
'id/'prefix (optional) cutoff_min:
1# Cutoff minimum: data below
thisvalue will not be recorded (optional) cutoff_max: 3e10 # Cutoff maximum: data above
thisvalue will not be recorded (optional)Info
All supported performance counters may be found under section "Supported Data IDs".
Parsing Output
A sample plugin named
simple-plot is provided and installed under
/usr/share/doc/ngauge/examples/plugins.
This plugin demonstrates how to open the output HDF5 file generated by
ngauge and plot the data. While it focuses on plotting, the data can also be used for various types of analysis. This plugin is a basic demonstration and is not intended for advanced use.
To plot the data from an
ngauge output file, use the following command:
/usr/share/doc/ngauge/examples/plugins/simple_plot.py <ngauge output .h5
file> <counter ID> [<counter ID> ...]
Use the
-h flag to view additional options, including plotting on a logarithmic scale and in the frequency domain.
If your output directory is
/tmp (the default), you can always reference the most recent results without manually copying the file name by using the expression
"$(ls -1 /tmp/ngauge_data_*.h5 | tail -n1)".
The following is a simple plot example. The lighter area around the samples represents the worst-case measurement error.
The GUI includes a zoom-in function that allows you to drill down for higher resolution.
An alternative plugin,
simple_text_plot.py, generates text-based plots directly in the terminal. Although its resolution is significantly lower, it supports zooming into specific regions and proves especially useful when graphical output is unavailable or when network connectivity to the server is limited.
The usage syntax is similar to that of the graphical plotting plugin. Use the
-h flag to view additional options, such as plotting in the frequency domain, specifying the start, end, or duration of the plotted interval, and more.
/usr/share/doc/ngauge/examples/plugins/simple_text_plot.py <ngauge output .h5
file> <counter ID> [<counter ID> ...]
Simple text plot example:
RX bytes port
0
┌─────────────────────────────────────────────────────────────────────────────────────┐
24864860578.2┤ ▗▐██▄▄▙▙▙█▄▙▄▄██▄▟██▄▟█▟▙▄█▟▄▙█▄▙▄▄▟▄▄█▟▄▄▄▟▙▄▄▟▟▙│
│ ▐█▛ │
│ ▐█ │
│ ▝ │
20720717148.5┤ │
│ │
│ │
│ │
16576573718.8┤ ▝ │
│ │
│ │
│ │
12432430289.1┤ ▝ │
│ │
│ ▝ │
│ │
8288286859.4┤ │
│ │
│ │
│ ▝ │
4144143429.7┤ │
│ │
│ │
│ ▗ │
0.0┤▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▟ │
└┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬┘
0.0
4.2
8.4
12.6
16.8
Approx. time (s)
The sample plugins are provided as examples and are not integral parts of the
ngauge tool. Dependencies such as NumPy, H5py, Matplotlib, plotext, and others may need to be installed separately to run these plugins.
To run
ngauge:
ngauge <configuration YAML
file>
The output is saved as an HDF5 file (
.h5) in the path specified in the configuration YAML.
To end a run before the full dataset is collected
Ctrl+C (SIGINT) can be used. This is a normal and supported way to end the run, and all results collected up to that point will be saved as usual.
During the run, progress bars for each counter will be displayed. These bars provide visual feedback on the counter activity, with color coding to indicate value levels:
Blue – Represents low values relative to other values of the same counter
Red – Represents high values relative to other values of the same counter
Intermediate colors (gradient) – Values between low and high, transitioning from blue to red
Solid gray bars – Indicate no changes in the values of this counter during the run
Data is updated once per second, showing the mean, standard deviation (σ), minimum, and maximum values observed during the last second.
This visual representation helps track counter activity in real time, offering immediate insights into system behavior.