Packet Level Monitoring Collector (PMC) Plugin
The Packet Monitoring Collector/Controller plugin facilitates the configuration capture and display of a variety of events, enabling users to conduct real-time monitoring of network events. The PMC plugin is included in the plugins bundle, which can be downloaded from NVIDIA's Licensing Portal.
Supported triggers are pFRN, Congestion, Fast Recovery, CQE and PHY Error Links.
Network events are stored as UFM events and are archived in files for later retrieval. Additionally, they can be observed through the PMC user interface. Events can be streamed externally via UFM REST API in the same way that UFM events are streamed. The REST APIs are described in the UFM Enterprise REST API Guide.
pFRN
pFRN Notifications - Enables/Disables mirroring on pFRN trigger for entire network or list of GUIDs
Fast Recovery
Fast Recovery Notifications - Enables/Disables mirroring on Fast Recovery trigger for entire network or list of GUIDs
Notifications Level - Specifies threshold for Fast Recovery mirroring. (Thresholds are configured in SM configuration)
PHY Error Links
PHY Error Links Notifications - Enables/Disables mirroring on PHY Link Error trigger for entire network or list of GUIDs
Specifies threshold for PHY Link Error mirroring. (Thresholds are configured in SM configuration)
CQE
CQE Notifications - Enables/Disables mirroring on CQE Notifications trigger for entire network or list of GUIDs
Congestion
Congestion Notifications - Enables/Disables mirroring on Congestion Notifications trigger for entire network or list of GUIDs
Mirrored packets (%) - Specifies the percent of congested packets to be mirrored.
High threshold - High threshold percentage for InfiniBand switch egress port queue size. Values are in the [1,1023] range.
Low threshold - Low threshold percentage for InfiniBand switch egress port queue size. Values are in the [1,1023] range.
When a packet enters an InfiniBand switch, its data is stored at an ingress port buffer. A pointer to the packet's data is inserted into the egress port's queue, from which the packet will be exiting the switch. At that point, the threshold given by this command line argument is compared to the egress queue data size. If the queue data size exceeds the threshold, a congestion event is reported. The threshold is given in percent of the ingress port size.
An egress port queue can point data coming from multiple ingress port buffers, therefore the threshold can be bigger than 100%.
Installation
Load the image on the UFM server; either using the UFM GUI -> Settings -> Plugins Management tab, or by loading the image via the following command:
docker load -I <path_to_image>
Upon completion of the plugin addition and subsequent refresh of the UFM GUI, the left navigation bar will display two new menu items. These two tabs can be observed in the following GUI screenshots