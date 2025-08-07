The service supports the following requests:

Capability : Describes the YANG files the service supports (UFM telemetry).

Get : Requires legal paths; receives the cache data from the service.

Subscribe: Requires legal paths and an interval; receives cache data at the specified interval. The first message contains headers extracted from the path, and subsequent messages include only the headersID. In on-change subscribe mode, a heartbeat interval is provided instead of an interval. During the heartbeat interval, if no data changes, no notification is sent; A full notification message, similar to the first message, is sent. If some data changes a notification of the change is sent; No heart message is send.

The capability request provides information about the YANG files that the server supports, including their versions. This request can be fulfilled without requiring a connection to the telemetry or inventory.

Request Example:

Copy Copied! gnmic -a localhost:9339 capability

Response Example:

Copy Copied! gNMI version: 1.3 . 0 - 2 supported models: - nvidia-ib-amber, Nvidia IB, 1.0 . 0 - nvidia-ib-amber-ext, Nvidia IB, 1.0 . 0 - nvidia-ib-amber-inventory-counters, Nvidia IB, 1.0 . 0 - nvidia-ib-amber-port-counters, Nvidia IB, 1.0 . 0 supported encodings: - JSON - JSON_IETF





To construct a path for a telemetry request, follow these steps:

Begin with " nvidia/ib ". Specify sharding if desired. For example, to partition the data into 10 pieces and take the second partition, use 2/10. Specify the node_guid to select, using an asterisk (*) to select all nodes. Specify the desired ports for the selected nodes, using an asterisk (*) to select all ports. Select " amber " for amBER telemetry. Specify the desired counters group. If unknown, this step can be skipped. Specify the counter, using an asterisk (*) to select all the counters in the cache. If a counters group is used, it will return all counters in the specified group.

Begin with " nvidia/ib ". Specify inventory or events .

To construct a path for switch rank information, follow these steps:

Begin with "nvidia/ib" . Specify the node_guid to select, using an asterisk (*) to select all nodes. Select "amber" for amBER telemetry. Use Switch_rank as the counter name.

Telemetry messages consist of two key components: Headers and Values, both representing telemetry data in a CSV format.

Headers : Initially provided in a full mode, but transition to a string hash format after the second message when using a subscribe request to reduce message size.

Values: Each value begins with a timestamp, followed by the node_guid and port number, and then the counter value in the same order as the headers. If a counter is not present for a node, it will be empty in the message.

In on-change subscribe messages, only nodes with changes and their corresponding modified values are included. All other counters for that node will remain empty.

Request Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/port_counters/hist0 --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/port_counters/hist1 -i 30s

Response Example:

Copy Copied! [ { "source" : "localhost:9339" , "subscription-name" : "default-1690282472" , "timestamp" : 1690282475124352000 , "time" : "2023-07-25T13:54:35.124352063+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply/sample" , "values" : { "nvidia/ib/amber/reply/sample" : { "Headers" : "timestamp,guid,port,hist0,hist1" , "HeaderID" : "5246201354" , "Values" : [ "240771222771818,0x8168793592c6a790,1,,2" , "240771222771818,0x47a67159c915493f,1,1,2" , "240771222771818,0x667203ac69f3f2bf,1,2," , "240771222771818,0x113cd807bfed3853,1,0," ] } } } ] } ]

The second message on the headers will be set to hash values.

The Get request retrieves data at a specified path. If the telemetry is devoid of information, the server will respond with an empty response. Otherwise, it will respond with counters it can locate.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure get --path nvidia/ib/guid[guid=0x5255456]/port[port_number=2]/amber/port_counters/hist0

The request retrieves data from node_guid 0x5255456 , specifically in port number 2, with the request counter set to hist0.

Example 2:

Copy Copied! gnmic -a localhost:9339 --insecure get --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/port_counters/hist0

The request retrieves the data from all the ports and the node_guids, with the request counter set to hist0.

Example 3:

Copy Copied! gnmic -a localhost:9339 --insecure get --path nvidia/ib/guid[guid=0x5255456]/port[port_number=2]/amber/*

The request retrieves the data from node_guid 0x5255456 , port 2, with the request counters set to "all".

Example for multi path:

Copy Copied! gnmic -a localhost:9339 --insecure get nvidia/ib/guid[guid=*]/port[port_number=*]/amber/CableInfo.transmitter_technology --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/sel_gctrln_en_5_lane0 --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/num_plls_7nm --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/rcal_fsm_done --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/LinkErrorRecoveryCounterExtended --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/sel_enc2_ib0_lane2 --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/lockdet_err_cnt_unlocked_sticky

Response Example:

Collapse Source Copy Copied! [ { "source" : "localhost:9339" , "timestamp" : 1719232374915165200 , "time" : "2024-06-24T15:32:54.915165166+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply" , "values" : { "nvidia/ib/amber/reply" : { "Headers" : [ "timestamp" , "Node_GUID" , "Port_Number" , "CableInfo.transmitter_technology" , "sel_gctrln_en_5_lane0" , "num_plls_7nm" , "rcal_fsm_done" , "LinkErrorRecoveryCounterExtended" , "sel_enc2_ib0_lane2" , "lockdet_err_cnt_unlocked_sticky" ], "Values" : [ "1719232345757948,0x91f87bf42deb3e03,1,5091,7826,6290,8615,4247,8586,6214" , "1719232345757948,0x7b8c2e08907250ce,1,2891,3293,5774,4398,3681,3548,7408" , "1719232345757948,0x48b60e6f3670eaca,1,9477,3847,1184,5527,4783,2102,8192" , "1719232345757948,0xabccdad7f8a3eda6,1,7976,6143,8257,3770,6166,6690,2835" , "1719232345757948,0x6d9ec4bb5fa45736,1,9051,2982,7145,3604,9256,1061,2638" , "1719232345757948,0x028cf9e0f9ed7c32,1,5623,7483,2263,2265,6890,4875,5564" , "1719232345757948,0x92a984c1a491b72a,1,6732,7795,6411,8569,3370,705,5536" , "1719232345757948,0x8b4b404acd2f34da,1,7610,7128,10064,1880,4834,3411,6724" , "1719232345757948,0x20f92ed58991d56c,1,6805,1632,5407,2038,1865,7279,8350" , "1719232345757948,0x1dac004a426bb5f5,1,8351,5757,7925,6181,3260,3081,1554" ] } } } ] } ]





The Subscribe request, similar to the get request, provides data from the specified path. When the telemetry is empty, the server responds with an empty result. If data is available, the server responds with the retrieved counters. The stream delivers information at the specified interval. If no interval is specified, the server transmits the information at the default server rate, which is configurable and defaults to 10s.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=0x5255456]/port[port_number=2]/amber/port_counters/hist0 -i 30s

This request retrieves data from the node_guid 0x5255456 , port 2, where the request counter is hist0, and the interval is configured for 30 seconds. If the user wishes to test the stream, the stream mode can be configured to "once," and following a single response, the stream will be stopped.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=0x5255456]/port[port_number=2]/amber/port_counters/hist0 -i 30s --mode once

This request retrieves the data from node_guid 0x5255456 , port 2, where the request counter is hist0. The stream shuts down after one response, similar to a Get request.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/* -i 10s

The server responds for the first two notifications, as follows:

Collapse Source Copy Copied! { "source" : "localhost:9339" , "subscription-name" : "default-1719233128" , "timestamp" : 1719233128171946500 , "time" : "2024-06-24T15:45:28.171946518+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply/sample" , "values" : { "nvidia/ib/amber/reply/sample" : { "HeaderID" : "970426048" , "Headers" : [ "timestamp" , "Node_GUID" , "Port_Number" , "Counter1" , "Counter2" , "Counter3" , "Counter4" , "Counter5" , "Counter6" , "Counter7" ], "Values" : [ "1719232345757948,0x91f87bf42deb3e03,1,5091,7826,6290,8615,4247,8586,6214" , "1719232345757948,0x7b8c2e08907250ce,1,2891,3293,5774,4398,3681,3548,7408" , "1719232345757948,0x1dac004a426bb5f5,1,8351,5757,7925,6181,3260,3081,1554" , "1719232345757948,0x48b60e6f3670eaca,1,9477,3847,1184,5527,4783,2102,8192" , "1719232345757948,0xabccdad7f8a3eda6,1,7976,6143,8257,3770,6166,6690,2835" , "1719232345757948,0x6d9ec4bb5fa45736,1,9051,2982,7145,3604,9256,1061,2638" , "1719232345757948,0x028cf9e0f9ed7c32,1,5623,7483,2263,2265,6890,4875,5564" , "1719232345757948,0x92a984c1a491b72a,1,6732,7795,6411,8569,3370,705,5536" , "1719232345757948,0x8b4b404acd2f34da,1,7610,7128,10064,1880,4834,3411,6724" , "1719232345757948,0x20f92ed58991d56c,1,6805,1632,5407,2038,1865,7279,8350" ] } } } ] } { "source" : "localhost:9339" , "subscription-name" : "default-1719233128" , "timestamp" : 1719233138173907700 , "time" : "2024-06-24T15:45:38.173907825+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply/sample" , "values" : { "nvidia/ib/amber/reply/sample" : { "HeaderID" : "970426048" , "Values" : [ "1719232345757948,0x20f92ed58991d56c,1,6805,1632,5407,2038,1865,7279,8350" , "1719232345757948,0x1dac004a426bb5f5,1,8351,5757,7925,6181,3260,3081,1554" , "1719232345757948,0x48b60e6f3670eaca,1,9477,3847,1184,5527,4783,2102,8192" , "1719232345757948,0xabccdad7f8a3eda6,1,7976,6143,8257,3770,6166,6690,2835" , "1719232345757948,0x6d9ec4bb5fa45736,1,9051,2982,7145,3604,9256,1061,2638" , "1719232345757948,0x028cf9e0f9ed7c32,1,5623,7483,2263,2265,6890,4875,5564" , "1719232345757948,0x92a984c1a491b72a,1,6732,7795,6411,8569,3370,705,5536" , "1719232345757948,0x8b4b404acd2f34da,1,7610,7128,10064,1880,4834,3411,6724" , "1719232345757948,0x91f87bf42deb3e03,1,5091,7826,6290,8615,4247,8586,6214" , "1719232345757948,0x7b8c2e08907250ce,1,2891,3293,5774,4398,3681,3548,7408" ] } } } ] }





The subscribe on-change request, similar to the standard subscribe request, provides data from the specified path. If the telemetry lacks data, the server responds with an empty result. When data is available, the server responds with the located counters.

The stream delivers information at the specified interval. If no changes occurred between heartbeats, all cached data will be transmitted. However, if a change occurred and was pushed to the client, no data will be transmitted during the heartbeat.

The path construction follows the same pattern as the get request and includes inventory and event paths. Only updated data will be included in the response, while all other parts remain empty but retain the specified format. Similarly, only the nodes that have been updated will be included in the response.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=0x5255456]/port[port_number=2]/amber/port_counters/hist0 --stream-mode on-change --heartbeat-interval 1m

This request retrieves data from node_guid 0x5255456 , port 2, with the request counters set to hist0. It periodically checks for changes every minute, and when changes are detected, it promptly sends the updated values.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/port_counters/* --stream-mode on-change --heartbeat-interval 1m

This request involves all nodes and ports, aiming to retrieve all counters from the telemetry. It periodically checks for changes every minute, and when changes are detected, it promptly sends the updated values.

The below is an example of the response to a particular GUID, which represents an on-change request for a few counters. However, only specific counters have been updated, those who have not updated have a value of 0. Because the flag include_old_data_on_change default is true

Copy Copied! 1706532307824 , 0x0002c903007e5220 , 1 , 0 , 0 , 0 , 41447490564 , 617155163 , 41423305825 , 617155163 , 24184739 , 17 , 0 , 0 , 0 , 0 , 0

The same example with the flag set to false will give this:

Copy Copied! 1706532307824 , 0x0002c903007e5220 , 1 ,,,, 41447490564 , 617155163 , 41423305825 , 617155163 , 24184739 , 17 ,,,,,

Only the values that have changed return while the others are empty values. To get this format of data, one need to change the include_old_data_on_change in the config file to false.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure sub --path nvidia/ib/guid[guid=*]/port[port_number=*]/amber/* --stream-mode on-change --heartbeat-interval 24h

The server responds for the first 2 notifications are the following (where include_old_data_on_change is true), one can see the last two columns have not changed but still return the data before, the second message was send due to some rows have changed, those rows

Collapse Source Copy Copied! { "source" : "localhost:9339" , "subscription-name" : "default-1719236764" , "timestamp" : 1719236764654659600 , "time" : "2024-06-24T16:46:04.654659517+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply/onchange" , "values" : { "nvidia/ib/amber/reply/onchange" : { "HeaderID" : "912200528" , "Headers" : [ "timestamp" , "Node_GUID" , "Port_Number" , "Counter1" , "Counter2" , "Counter3" , "Counter4" , "Counter5" , "Counter6" , "Counter7" ], "Values" : [ "1719236753818594,0x7e680fb8f81a1950,1,100531,107250,100999,107455,109258,3716,5329" , "1719236753818594,0x0176438fe4ee507c,1,104269,108884,104887,108502,105366,4540,6673" , "1719236753818594,0x2e36224302959e79,1,101228,100555,105616,102767,108899,87,9953" , "1719236753818594,0x8e62a55d7571a9b8,1,100684,108124,106670,102400,106689,2910,4203" , "1719236753818594,0x0be75a9e97016f5e,1,102227,102735,108903,103547,108705,2629,1830" , "1719236753818594,0x8307bfad0672adbd,1,106033,103906,106185,107450,105736,2567,6914" , "1719236753818594,0x2cbe66ec0b1af84c,1,105958,106959,100349,107704,105073,8330,4962" , "1719236753818594,0x6b6da39a9ec4bbfc,1,104340,106752,109134,103796,103500,7136,3493" , "1719236753818594,0x6d122dbdd99cfb60,1,104941,107630,104190,105392,109582,5480,7934" , "1719236753818594,0xeed4bd9cd3b7f325,1,102416,100164,106731,102033,103807,3048,6316" ] } } } ] } { "source" : "localhost:9339" , "subscription-name" : "default-1719236764" , "timestamp" : 1719237054620929500 , "time" : "2024-06-24T16:50:54.620929561+03:00" , "updates" : [ { "Path" : "nvidia/ib/amber/reply/onchange" , "values" : { "nvidia/ib/amber/reply/onchange" : { "HeaderID" : "912200528" , "Values" : [ "1719237054172043,0xeed4bd9cd3b7f325,1,117416,115164,121731,117033,118807,3048,6316" , "1719237054172043,0x2e36224302959e79,1,116228,115555,120616,117767,123899,87,9953" , "1719237054172043,0x8e62a55d7571a9b8,1,115684,123124,121670,117400,121689,2910,4203" , "1719237054172043,0x7e680fb8f81a1950,1,115531,122250,115999,122455,124258,3716,5329" , "1719237054172043,0x0176438fe4ee507c,1,119269,123884,119887,123502,120366,4540,6673" ] } } } ] }





Inventory messages are conveyed in separate updates, presenting the inventory details of the UFM associated with the provided IP. These messages display comprehensive information, including the total count of various components within the UFM, such as switches, routers, servers, and more, along with details about active ports and the total number of ports, including disabled ones. Moreover, inventory requests include the size of the telemetry, which is not always the same as the active ports. In cases where the plugin is unable to establish contact with the UFM, it will revert to using default values defined in the configuration file. It is worth noting that the path for inventory requests differs from the conventional path structure, as they do not rely on specific nodes or ports. Consequently, inventory requests are initiated after " nvidia/ib ."

Example:

Copy Copied! gnmic -a localhost:9339 --insecure get –path nvidia/ib/inventory/*

Response:

Copy Copied! [ { "source" : "localhost:9339" , "timestamp" : 1698824237536878000 , "time" : "2023-11-01T09:37:17.536878067+02:00" , "updates" : [ { "Path" : "nvidia/ib/inventory" , "values" : { "nvidia/ib/inventory" : { "ActivePorts" : 4 , "Cables" : 2 , "Gateways" : 0 , "HCAs" : 2 , "Routers" : 0 , "Servers" : 2 , "Switches" : 1 , "TotalPorts" : 38 , "TelemetrySize" : 4 , "timestamp" : 1698824211535069000 } } } ] } ]





Events messages are provided in separate updates, offering insights into the events occurring within the UFM associated with the specified IP. Given that the event metadata remains consistent, even when numerous events are part of a request, the message format adopts a CSV-like structure. The Headers section contains essential metadata regarding UFM events, while the Values section contains the raw event data. Users can subscribe to these events with the on-change feature enabled, receiving only the events triggered within the subscription interval. Notably, the path structure for event requests differs from the typical node or port-based structure and is requested after " nvidia/ib ."

Example:

Copy Copied! gnmic -a localhost:9339 --insecure get –path nvidia/ib/events/*

Response:

Collapse Source Copy Copied! [ { "source" : "localhost:9339" , "timestamp" : 1698824809647515600 , "time" : "2023-11-01T09:46:49.647515575+02:00" , "updates" : [ { "Path" : "nvidia/ib/events" , "values" : { "nvidia/ib/events" : { "Headers" : [ "id" , "object_name" , "write_to_syslog" , "description" , "type" , "event_type" , "severity" , "timestamp" , "counter" , "category" , "object_path" , "name" ], "Values" : [ "7718,Grid,false,Disk space usage in /opt/ufm/files/log is above the threshold of 90.0%.,Grid,525,Critical,2023-11-01 07:25:54,N/A,Maintenance,Grid,Disk utilization threshold reached" , "7717,Grid,false,Disk space usage in /opt/ufm/files/log is above the threshold of 90.0%.,Grid,525,Critical,2023-11-01 07:24:54,N/A,Maintenance,Grid,Disk utilization threshold reached" , "7716,Grid,false,Disk space usage in /opt/ufm/files/log is above the threshold of 90.0%.,Grid,525,Critical,2023-11-01 07:23:54,N/A,Maintenance,Grid,Disk utilization threshold reached" , "7491,ec0d9a0300d42e54,false,Mcast group is deleted: ff12601bffff0000, 00000002,Computer,67,Info,2023-10-31 06:39:21,N/A,Fabric Notification,default / Computer: r-ufm59,MCast Group Deleted" ] } } } ] } ]





Switch rank updates are conveyed in separate messages, presenting the rank of the switches in the UFM. This data is derived from a file in the UFM and is updated by the server every 6 hours by default. The switch_rank counter is associated only with switch-level data, so there is no need to specify a port in the path. However, this counter is not connected to the telemetry cache of switch-level data. Note that if the ufm_ip is changed, the switch_rank information will not be available.

Example:

Copy Copied! gnmic -a localhost:9339 --insecure get --path nvidia/ib/guid[guid=*]/amber/switch_rank

Response:

Copy Copied! { "source" : "localhost:9339" , "timestamp" : 1719296207323383300 , "time" : "2024-06-25T09:16:47.323383222+03:00" , "updates" : [ { "Path" : "nvidia/ib/guid[guid=*]/amber/amber/switch_rank" , "values" : { "nvidia/ib/guid/amber/amber/switch_rank" : { "Headers" : "Timestamp,Node_GUID,switch_rank" , "Values" : [ "1719296205612,0x0002c903007e5220,0" ] } } } ] }





UFM Health KPI messages are provided in separate updates, offering insights into the UFM Health metrics occurring within the UFM associated with the specified IP. The response value is Prometheus formatted, as a one big string. Users can subscribe to these UFM Health KPI with the on-change feature enabled, receiving the whole UFM Health metrics if there is a change in one item. Notably, the path structure for UFM health KPI requests differs from the typical node or port-based structure and is requested after " nvidia/ib ."

Example:

Copy Copied! gnmic -a localhost:9339 --insecure get --path nvidia/ib/ufm_health_kpi/*

Response:

Copy Copied! { "source" : "localhost:9339" , "timestamp" : 1719296207323383300 , "time" : "2024-06-25T09:16:47.323383222+03:00" , "updates" : [ { "Path" : "nvidia/ib/ufm_health_kpi" , "values" : { "nvidia/ib/ufm_health_kpi" : { "value" : "# HELP server_cpu_usage_percent_avg Average of Server CPU usage percent # TYPE server_cpu_usage_percent_avg gauge server_cpu_usage_percent_avg{duration= "Last 5 minutes" } 1.5545454545454547 server_cpu_usage_percent_avg{duration= "Last 1 hour" } 1.4975206611570255 server_cpu_usage_percent_avg{duration= "Last 24 hour" } 1.505277777777778 ... events_history_counter{duration= "Last week" ,event_name= "Director Switch is Down" } 0.0 events_history_counter{duration= "Last week" ,event_name= "Node is Up" } 0.0 events_history_counter{duration= "Last week" ,event_name= "Node is Down" } 0.0 events_history_counter{duration= "Last week" ,event_name= "Link is Up" } 0.0 events_history_counter{duration= "Last week" ,event_name= "Link is Down" } 0.0 ", } } } ] }

The gNMI plugin includes a built-in Telemetry Notification Server that enables event-driven data synchronization between UFM Telemetry endpoints and the gNMI server. This real-time communication complements the existing periodic telemetry fetching mechanism controlled by the telemetry_interval parameter. See Telemetry Configurations.

The Telemetry Notification Server allows UFM Telemetry to push updates to the gNMI server immediately when new data becomes available, reducing latency and enhancing responsiveness compared to periodic polling alone.

To enable UFM Telemetry to notify the gNMI server when new data is ready, you must add the below configuration line to the UFM Telemetry .ini file (For example: /opt/ufm/files/conf/secondary_telemetry_defaults/launch_ibdiagnet_config.ini )

Copy Copied! plugin_env_UFM_TELEMETRY_NOTIFY_ENDPOINTS=http:

Replace <Telemetry_HTTP_PORT> with the actual HTTP port of the telemetry endpoint (e.g., 9002 for a secondary telemetry instance).

After updating the configuration, restart the UFM Telemetry service to apply the changes.

Copy Copied! # For UFM Bare-metal /etc/initd/ufmd ufm_telemetry_restart # For UFM Docker docker exec ufm /etc/initd/ufmd ufm_telemetry_restart