NVIDIA UFM Enterprise User Manual v6.18.0
NVIDIA UFM Enterprise User Manual v6.18.0

Packet Level Monitoring Collector (PMC) Plugin

The Packet Monitoring Collector/Controller plugin facilitates the configuration capture and display of a variety of events, enabling users to conduct real-time monitoring of network events. The PMC plugin is included in the plugins bundle, which can be downloaded from NVIDIA's Licensing Portal.

Supported triggers are pFRN, Congestion, Fast Recovery, CQE and PHY Error Links.

Network events are stored as UFM events and are archived in files for later retrieval. Additionally, they can be observed through the PMC user interface. Events can be streamed externally via UFM REST API in the same way that UFM events are streamed. The REST APIs are described in the UFM Enterprise REST API Guide.

pFRN

  • pFRN Notifications - Enables/Disables mirroring on pFRN trigger for entire network or list of GUIDs

Fast Recovery

  • Fast Recovery Notifications - Enables/Disables mirroring on Fast Recovery trigger for entire network or list of GUIDs

  • Notifications Level - Specifies threshold for Fast Recovery mirroring. (Thresholds are configured in SM configuration)

PHY Error Links

  • PHY Error Links Notifications - Enables/Disables mirroring on PHY Link Error trigger for entire network or list of GUIDs

  • Specifies threshold for PHY Link Error mirroring. (Thresholds are configured in SM configuration)

CQE

  • CQE Notifications - Enables/Disables mirroring on CQE Notifications trigger for entire network or list of GUIDs

Congestion

  • Congestion Notifications - Enables/Disables mirroring on Congestion Notifications trigger for entire network or list of GUIDs

    • Mirrored packets (%) - Specifies the percent of congested packets to be mirrored.

    • High threshold - High threshold percentage for InfiniBand switch egress port queue size. Values are in the [1,1023] range.

    • Low threshold - Low threshold percentage for InfiniBand switch egress port queue size. Values are in the [1,1023] range.

Note

When a packet enters an InfiniBand switch, its data is stored at an ingress port buffer. A pointer to the packet's data is inserted into the egress port's queue, from which the packet will be exiting the switch. At that point, the threshold given by this command line argument is compared to the egress queue data size. If the queue data size exceeds the threshold, a congestion event is reported. The threshold is given in percent of the ingress port size.

An egress port queue can point data coming from multiple ingress port buffers, therefore the threshold can be bigger than 100%.


Installation

Load the image on the UFM server; either using the UFM GUI -> Settings -> Plugins Management tab, or by loading the image via the following command:

  1. Login to the UFM server terminal.

  2. Run

    Copy
    Copied!
                

    docker load -I <path_to_image>

    pmc0-version-1-modificationdate-1724059764627-api-v2.png

Upon completion of the plugin addition and subsequent refresh of the UFM GUI, the left navigation bar will display two new menu items. These two tabs can be observed in the following GUI screenshots

© Copyright 2024, NVIDIA. Last updated on Aug 27, 2024.