What can I help you with?
DOCA Documentation v3.0.0

DOCA Ngauge

This document provides instructions on the usage of the ngauge tool.

ngauge tool is used to analyze, visualize, and debug network performance on a single node. The tool is designed for probing NIC hardware counters and storing the collected data in HDF5 format, along with relevant metadata, for subsequent processing. Additionally, the tool provides graphical progress updates and a measurement summary directly on the CLI, offering real-time insights into the measurement process.

Info

Supported hardware: NVIDIA® BlueField®-3, NVIDIA® ConnectX®-7, and above.

Info

ngauge relies on the fwctl driver and, therefore, cannot be run simultaneously with other tools or services that also utilize this driver.

  • BlueField-3 or ConnectX-7 and above with firmware version xx.43.1000 or higher

  • fwctl driver installed on the host:

    OS

    Commands

    Deb-based 1

    1. Search for the package:

      Copy
      Copied!
                  

      apt-cache search fwctl

    2. Install the package:

      Copy
      Copied!
                  

      sudo apt install <package-name>

    RPM-based

    1. Search for the package:

      Copy
      Copied!
                  

      dnf search fwctl

    2. Install the package:

      Copy
      Copied!
                  

      sudo dnf install <package-name>

    1. On Ubuntu 20.04, the fwctl driver is not loaded automatically upon system startup. To load it, run the command modprobe mlx5_fwctl after every reboot.

Installing Ngauge

Install ngauge by running sudo apt-get install ngauge or sudo dnf install ngauge (on x86 or Arm 64 hosts).

Info

On the DPU, the ngauge package is pre-installed, so the above step is not necessary.


All the configurations for ngauge are defined in an input YAML file.

  1. Copy a sample configuration file from /usr/share/doc/ngauge/examples/settings.

    Info

    Use the one which best fits your scenario single/dual port, etc.

  2. Specify the device to run on using its PCIe address. For example:

    Copy
    Copied!
                

    device: "0000:03:00.0"

  3. Configure the output path and file prefix (both are mandatory):

    Copy
    Copied!
                

    output: path: /path/to/output/directory prefix: "ngauge_data_" silent: false

    • The output file is saved in the format /path/to/output/directory/ngauge_data_<DATE>_<TIME>.h5.

    • The exact file name is printed after each run.

    • If the silent option is set to true, progress indications on the command line are suppressed (default: false).

  4. Configure parameters for the application's runtime behavior:

    Copy
    Copied!
                

    params: mode: repetitive # [repetitive, single] period_us: 1e2 # Sampling period in microseconds (e.g., "1e2" = 100 μs)

    Info

    Numbers in decimal or scientific notation are accepted. In the example, 1e2 means 100 μs.

  5. Define the counters to measure. The id (data ID) is the only mandatory field. Additional fields are optional:

    Copy
    Copied!
                

    counters: - id: 0x1020000100000000 # Data ID (mandatory) desc: RX bytes port 0 # Description (optional) unit: RX port # Unit type (optional) accumulating: false # Whether the counter accumulates values (optional) normalizer: time # Normalizer, if present, must be either 'time', or a non-zero number, or one of the configured Data IDs with an 'id/' prefix (optional)    cutoff_min: 1 # Cutoff minimum: data below this value will not be recorded (optional) cutoff_max: 3e10 # Cutoff maximum: data above this value will not be recorded (optional)

    Info

    All supported performance counters may be found under section "Supported Data IDs".

Parsing Output

A sample plugin named simple-plot is provided and installed under /usr/share/doc/ngauge/examples/plugins.

This plugin demonstrates how to open the output HDF5 file generated by ngauge and plot the data. While it focuses on plotting, the data can also be used for various types of analysis. This plugin is a basic demonstration and is not intended for advanced use.

To plot the data from an ngauge output file, use the following command:

Copy
Copied!
            

/usr/share/doc/ngauge/examples/plugins/simple_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]

Tip

If your output directory is /tmp (the default), you can always reference the most recent results without manually copying the file name by using the expression "$(ls -1 /tmp/ngauge_data_*.h5 | tail -n1)".

The following is a simple plot example. The lighter area around the samples represents the worst-case measurement error.

plugin_plot_e-version-1-modificationdate-1743510565063-api-v2.png

Info

The GUI includes a zoom-in function that allows you to drill down for higher resolution.

An alternative plugin, simple_text_plot.py, produces text-based plots in the terminal. While the resolution is much lower, this method is highly useful when graphical output is unavailable or when the network connection to the server is slow.

The usage syntax is identical to the graphical plotting plugin:

Copy
Copied!
            

/usr/share/doc/ngauge/examples/plugins/simple_text_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]

Simple text plot example:

Copy
Copied!
            

RX bytes port 0 ┌─────────────────────────────────────────────────────────────────────────────────────┐ 24864860578.2┤ ▗▐██▄▄▙▙▙█▄▙▄▄██▄▟██▄▟█▟▙▄█▟▄▙█▄▙▄▄▟▄▄█▟▄▄▄▟▙▄▄▟▟▙│ │ ▐█▛ │ │ ▐█ │ │ ▝ │ 20720717148.5┤ │ │ │ │ │ │ │ 16576573718.8┤ ▝ │ │ │ │ │ │ │ 12432430289.1┤ ▝ │ │ │ │ ▝ │ │ │  8288286859.4┤ │ │ │ │ │ │ ▝ │  4144143429.7┤ │ │ │ │ │ │ ▗ │ 0.0┤▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▟ │ └┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬┘ 0.0 4.2 8.4 12.6 16.8  Approx. time (s)

Info

The sample plugins are provided as examples and are not integral parts of the ngauge tool. Dependencies such as NumPy, H5py, Matplotlib, plotext, and others may need to be installed separately to run these plugins.


To run ngauge:

Copy
Copied!
            

ngauge <configuration YAML file>

The output is saved as an HDF5 file (.h5) in the path specified in the configuration YAML.

Info

To end a run before the full dataset is collected Ctrl+C (SIGINT) can be used. This is a normal and supported way to end the run, and all results collected up to that point will be saved as usual.

During the run, progress bars for each counter will be displayed. These bars provide visual feedback on the counter activity, with color coding to indicate value levels:

  • Blue – Represents low values relative to other values of the same counter

  • Red – Represents high values relative to other values of the same counter

  • Intermediate colors (gradient) – Values between low and high, transitioning from blue to red

  • Solid gray bars – Indicate no changes in the values of this counter during the run

Info

Data is updated once per second, showing the mean, standard deviation (σ), minimum, and maximum values observed during the last second.

This visual representation helps track counter activity in real time, offering immediate insights into system behavior.

ib_write_bw_demo-version-1-modificationdate-1743510858327-api-v2.png

© Copyright 2025, NVIDIA. Last updated on May 5, 2025.