DOCA Documentation v2.10.0

DOCA Ngauge

This document provides instructions on the usage of the ngauge tool.

ngauge tool is used to analyze, visualize, and debug network performance on a single node. The tool is designed for probing NIC hardware counters and storing the collected data in HDF5 format, along with relevant metadata, for subsequent processing. Additionally, the tool provides graphical progress updates and a measurement summary directly on the CLI, offering real-time insights into the measurement process.

Info

Supported hardware for NVIDIA® BlueField®-3, NVIDIA® ConnectX®-7, and above.

Info

ngauge relies on the fwctl driver and, therefore, cannot be run simultaneously with other tools or services that also utilize this driver.

  • BlueField-3 or ConnectX-7 and above with firmware version xx.43.1000 or higher

  • fwctl driver installed on the host:

    OS

    Commands

    Deb-based 1

    1. Search for the package:

      Copy
      Copied!
                  

      apt-cache search fwctl

    2. Install the package:

      Copy
      Copied!
                  

      sudo apt install <package-name>

    RPM-based

    1. Search for the package:

      Copy
      Copied!
                  

      dnf search fwctl

    2. Install the package:

      Copy
      Copied!
                  

      sudo dnf install <package-name>

    1. On Ubuntu 20.04, the fwctl driver is not loaded automatically upon system startup. To load it, run the command modprobe mlx5_fwctl after every reboot.     

Installing Ngauge

Install ngauge by running sudo apt-get install ngauge or sudo dnf install ngauge (on x86 or Arm 64 hosts).

Info

Note that on the DPU the ngauge package is pre-installed, so the above step is not needed.


All configurations for ngauge are defined in an input YAML file.

  1. Copy a sample configuration file from /usr/share/doc/ngauge/examples/settings.

  2. Specify the device to run on using its PCIe address. For example:

    Copy
    Copied!
                

    device: "0000:03:00.0"

  3. Configure the output path and file prefix (both are mandatory):

    Copy
    Copied!
                

    output: path: /path/to/output/directory prefix: "ngauge_data_" silent: false

    • The output file is saved in the format /path/to/output/directory/ngauge_data_<DATE>_<TIME>.h5.

    • The exact file name is printed after each run.

    • If the silent option is set to true, progress indications on the command line are suppressed (default: false).

  4. Configure parameters for the application's runtime behavior:

    Copy
    Copied!
                

    params: mode: repetitive # [repetitive, single] period_us: 1e2 # Sampling period in microseconds (e.g., "1e2" = 100 μs)

    Info

    Numbers in decimal or scientific notation are accepted. In the example, 1e2 means 100 μs.

  5. Define the counters to measure. The id (data ID) is the only mandatory field. Additional fields are optional:

    Copy
    Copied!
                

    counters: - id: 0x1020000100000000 # Data ID (mandatory) desc: RX bytes port 0 # Description (optional) unit: RX port # Unit type (optional) accumulating: false # Whether the counter accumulates values (optional) normalizer: time # Normalizer ('time' or a number, optional)

    Info

    All supported performance counters may be found under section "Supported Data IDs".

Parsing Output

A sample plugin named simple-plot is provided and installed under /usr/share/doc/ngauge/examples/plugins.

This plugin demonstrates how to open the output HDF5 file generated by ngauge and plot the data. While it focuses on plotting, the data can also be used for various types of analysis. This plugin is a basic demonstration and is not intended for advanced use.

To plot the data from an ngauge output file, use the following command:

Copy
Copied!
            

/usr/share/doc/ngauge/examples/plugins/simple_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]

Tip

If your output directory is /tmp (the default), you can always reference the most recent results without manually copying the file name by using the expression "$(ls -1 /tmp/ngauge_data_*.h5 | tail -n1)".

Simple plot example:

plugin_plot_nn-version-1-modificationdate-1738007629807-api-v2.png

An alternative plugin, simple_text_plot.py, produces text-based plots in the terminal. While the resolution is lower, this method is highly useful when graphical output is unavailable or when the network connection to the server is slow.

The usage syntax is identical to the graphical plotting plugin:

Copy
Copied!
            

/usr/share/doc/ngauge/examples/plugins/simple_text_plot.py <ngauge output .h5 file> <counter ID> [<counter ID> ...]

Simple text plot example:

Copy
Copied!
            

RX bytes port 0 ┌─────────────────────────────────────────────────────────────────────────────────────┐ 24864860578.2┤ ▗▐██▄▄▙▙▙█▄▙▄▄██▄▟██▄▟█▟▙▄█▟▄▙█▄▙▄▄▟▄▄█▟▄▄▄▟▙▄▄▟▟▙│ │ ▐█▛ │ │ ▐█ │ │ ▝ │ 20720717148.5┤ │ │ │ │ │ │ │ 16576573718.8┤ ▝ │ │ │ │ │ │ │ 12432430289.1┤ ▝ │ │ │ │ ▝ │ │ │  8288286859.4┤ │ │ │ │ │ │ ▝ │  4144143429.7┤ │ │ │ │ │ │ ▗ │ 0.0┤▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▟ │ └┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬┘ 0.0 4.2 8.4 12.6 16.8  Approx. time (s)

Info

The sample plugins are provided as examples and are not integral parts of the ngauge tool. Dependencies such as NumPy, H5py, Matplotlib, plotext, and others may need to be installed separately to run these plugins.


To run ngauge:

Copy
Copied!
            

ngauge <configuration YAML file>

The output is saved as an HDF5 file (.h5) in the path specified in the configuration YAML.

Info

To end a run before the full dataset is collected Ctrl+C (SIGINT) can be used. This is a normal and supported way to end the run, and all results collected up to that point will be saved as usual.

During the run, progress bars for each counter will be displayed. These bars provide visual feedback on the counter activity, with color coding to indicate value levels:

  • Blue – Represents low values relative to other values of the same counter

  • Red – Represents high values relative to other values of the same counter

  • Intermediate colors (gradient) – Values between low and high, transitioning from blue to red

  • Solid gray bars – Indicates no changes in the values of this counter during the run

nccl_demo-version-1-modificationdate-1738007629413-api-v2.png

This visual representation helps track counter activity in real time, offering immediate insights into system behavior.

© Copyright 2025, NVIDIA. Last updated on Feb 26, 2025.