Telemetry - User-Defined Sessions

NVIDIA UFM Enterprise User Manual v6.17.1 Download PDF

UFM Telemetry allows tracking network bandwidth, congestion, errors, and latency. UFM offers the following telemetry features:

  • Real-time monitoring views

  • Multiple attributes monitoring

  • Intelligent Counters: provide error and congestion counters

  • InfiniBand port-based error counters

  • InfiniBand congestion XmitWait counter-based congestion measurement

  • InfiniBand port-based bandwidth data

The following actions may be taken with the telemetry session panels:

  • Rearranging – using a simple drag-and-drop function

  • Resizing – by hovering over the panel's border

It is also possible to get a larger view of a telemetry session by clicking the pop-out button on the top right-hand corner of each panel.

Telemetry Session Objects and Attributes

Monitored objects may be ports or devices in the fabric.

Monitored attributes can be raw counters or calculated counters:

  • A raw attribute is a simple attribute to be monitored (e.g., Port TX Wait)

  • A calculated attribute is an attribute that has been calculated based on one or more counters (e.g., PortXmitPktsRate)

Telemetry contains multiple views; the user can create, edit, and delete views.

Telemetry supports two types of panels, time-series which show the relationship between time and counter value for a specific device, and topX, which show all ports with pick by counter greater than topX value.

Note

TopX is not supported in the case of the ibpm telemetry provider. The telemetry provider is hidden in this case.

image2021-12-1_5-52-49-version-1-modificationdate-1716899701967-api-v2.png

The panel can be created by filling in the following model:

image2021-12-1_5-53-32-version-1-modificationdate-1716899700937-api-v2.png

The user can select one of the following telemetry session modes:

Telemetry_Session-version-1-modificationdate-1716899756407-api-v2.png

  • Timeseries: Provide the user with historical/live time-series graphs of the selected counters for the selected devices/ports.

  • Top X: Provides the user with Top X ports by the selected counters (where X is 5, 10, 15, 20).

You can select the members grouping type; Devices or Ports:

Telemetry_Members-version-1-modificationdate-1716899755027-api-v2.png

Note

In case the selected telemetry session is Top-X, only the ports are supported.

The user can select one or more counters from the counters dropdown menu:

Telemetry_Counters-version-1-modificationdate-1716899754117-api-v2.png

Alternatively, the user can get a full view of all the supported counters and select one or more by clicking on the "All Counters" button:

MicrosoftTeams-image-version-1-modificationdate-1716899689627-api-v2.png

The user can select one or more devices/ports from the relevant dropdown menu:

  • Devices:

    Telemetry_Devices-version-1-modificationdate-1716899754633-api-v2.png

    Alternatively, the user can choose to get a full view of the devices by clicking on the "All Devices" button:

    error-i18nkey-editor-placeholder-broken-image-locale-en_us-version-2.png

    com.atlassian.confluence.content.render.xhtml.XhtmlException: Missing required attribute: {http://atlassian.com/resource/identifier}value

  • Ports:
    After switching from "Devices" to "Ports," you user can view the ports’ dropdown menu:

    Telemetry_Ports-version-1-modificationdate-1716899755937-api-v2.png

    Alternatively, the user can choose to get a full view of the ports by clicking on the "All Ports" button.

Data aggregation can be changed in the timeseries panel by grouping the members by device or ports; this functionality is an option in the context menu. Therefore, if the timeseries panel is created with the "Devices" members, the panel shows each port in an individual line by right-clicking and then grouping by ports.

image2021-12-1_5-55-27-version-1-modificationdate-1716899697967-api-v2.png

image2021-12-1_5-56-6-version-1-modificationdate-1716899696643-api-v2.png

The Telemetry obtains live data from the server's each specific interval which equals the default session interval. The interval can be changed from the sampling rate option in the context menu.

image2021-12-1_5-56-52-version-1-modificationdate-1716899695937-api-v2.png

The starting time of timeseries panel can be changed from the time calendar at the top of the page, time can be "Time Range" or "Custom". In case the "Custom" option is chosen, only history data is shown.

image2021-12-1_5-57-29-version-1-modificationdate-1716899695260-api-v2.png

The panel can be edited by changing members, members' type and grouping. The changes can be discarded or saved. The panel can also be deleted.

image2021-12-1_6-0-26-version-1-modificationdate-1716899694313-api-v2.png

image2021-12-1_6-0-40-version-1-modificationdate-1716899693643-api-v2.png

image2021-12-1_6-1-16-version-1-modificationdate-1716899692760-api-v2.png

The threshold is supported in Telemetry as a line drawn at the threshold value.

image2021-12-1_6-2-31-version-1-modificationdate-1716899691883-api-v2.png

In the Devices table, the user can see telemetry data for one or multiple devices as timeseries chart by clicking on the monitoring option in the context menu.

image2022-4-28_22-42-1-version-1-modificationdate-1716899751483-api-v2.png

image2021-12-1_6-3-46-version-1-modificationdate-1716899690103-api-v2.png

© Copyright 2024, NVIDIA. Last updated on Jun 7, 2024.